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Chapter  1 
INTRODUCTION 


One  of  the  most  important  services  that  the  operational  meteorologist  can  per- 
form is  to  give  the  probability  that  a specific  weather  event  will  occur.  For  very 
long-range  forecasts,  climatological  probabilities  generally  are  used.  Climatologi- 
cal probabilities  have  been  prepared  for  a wide  variety  of  events  — high  winds, 
extreme  temperatures,  low  visibility,  heavy  precipitation,  etc.  Indeed,  climatolog- 
ical probabilities  have  already  been  prepared  for  most  operationally  important  events. 

For  shorter  range  forecasts,  many  statistical  methods  are  available  for  relat- 
ing the  probability  of  a future  event  with  observed  or  forecast  weather  elements 
(e.g.,  Panofsky  and  Brier,  1965).  Using  these  methods,  it  generally  is  desirable 
to  have  a large  set  of  data  to  generate  the  probability  estimates.  In  some  cases, 
a sufficient  data  set  is  difficult  to  obtain,  particularly  if  some  of  these  data 
come  from  a numerical  weather  prediction  modei  since  these  models  tend  to  change  as 
improved  versions  are  put  into  operation. 

The  purpose  of  this  report  is  to  describe  a multivariate,  "t ransnormalized" 
method  that  makes  explicit  use  of  climatological  probabilities  and  requires  a rela- 
tively small  data  base.  This  method  is  called  the  climatologically  transnormalized 
regression  probability  model,  briefly  identified  as  the  TRP  model. 

The  TRP  model  allows  predictand  categories  to  be  variable  and  easily  changed  in 
operational  use.  Categorical  and  continuous  variables  can  be  used.  Compared  with 
other  methods  used  in  statistical  forecasting,  such  as  discriminant  analysis,  TRP  is 
favorably  structured  for  modeling  in  time  and  space.  Considerable  success  has  been 
obtained  in  developing  the  model  for  one  location  and  applying  it  to  another.  Veri- 
fication analysis  of  the  model  leads  to  a framework  for  Judging  the  accuracy  and 
correcting  biases  in  probability  forecasts. 

The  TRP  model  consists  of  three  main  procedures:  transnormalization,  correla- 
tion, and  regression  conditional  probability.  These  procedures  are  not  new.  Trans- 
normalization and  correlation  were  developed  by  Edgeworth  (1898),  Pearson  (1895), 
and  others  before  the  turn  of  the  century.  Although  regression  probability  is  a 
straight-forward  result  of  multivariate  normal  analysis,  its  potential  when  combined 
with  transnormalization  and  correlation  has  not  been  exploited.  The  model  seems 
particularly  well  suited  for  meteorology  where  the  climatology  of  most  elements  is 
known  and  the  correlation  between  many  pairs  of  variates  is  known  or  can  be  modeled. 

The  three  parts  were  first  combined  into  meteorological  models  by  McCabe  (1968) 
and  Gringorten  (1968,  1971,  1972).  These  models  were  presented  at  the  First  Statis- 
tical Conference  of  the  American  Meteorological  Society  in  Hartford,  Connecticut  in 
1968.  Further  development  of  the  TRP  model  was  stimulated  by  correspondence  between 
Gringorten  and  the  author  while  the  author  was  stationed  in  Tokyo,  Japan.  Optimiz- 
ing the  model  and  applying  it  to  various  operational  forecasting  problems  was 
judged  to  be  the  best  application  of  climatology  in  Air  Weather  Service  in  1972. 
Verification  of  model  forecasts  was  presented  at  the  Third  Conference  on  the  Use  of 
Probability  and  Statistics  (Boehm,  1973).  Multiple  predictor  equations  for  the 
model  were  derived  in  1973  and  final  development  of  efficient  transnormalization 
algorithms  was  completed  in  197^. 

This  report  is  organized  as  follows:  First,  some  statistical  forecasting  terms 
are  defined,  and  an  overview  of  the  TRP  model  is  presented  with  an  introduction  to 
the  three  procedures.  Next,  several  transnormalization  methods  are  presented,  cor- 
relation for  continuous  and  categorized  variables  is  described,  and  the  regression 
probability  equation  is  derived.  Finally,  some  examples  and  applications  are 
described. 


1 


L __ ■ J 


fi-eced 'Papa  ~ " 


December  197<> 


AWS  Technical  Report  75-259 


Chapter  2 

DOME  STATISTICAL  FORECASTING  TERMS  AND  VIEWPOINTS 


2.1.  Forecast  States  and  Moles. 

Typically,  a forecast  method  goes  through  four  states  of  implementation:  devel- 
opment, testing,  evaluation,  and  operational  use.  A statistical  forecast  method  is 
developed  by  using  previous  observations  to  fit  parameters  of  a given  statistical 
model.  Cue  important  part  of  the  development  state  is  collecting,  checking,  and 
building  a data  base  file.  This  set  of  previous  observations  is  termed  the  develop- 
ment or  dependent  data  set.  Once  the  method  is  developed,  It  usually  is  teste  i on 
a new  data  set  — the  Independent  data  set.  If  the  procedure  performs  satisfactorily, 
it  may  then  be  evaluated  in  an  operational  forecast  setting  and  finally  operationally 
used  to  give  forecasts  to  a 'ustomer.  The  four  states  are  desirable  but  not  m&r.ia- 
tory,  and  alternate  descriptions  are  used.  See,  for  example.  Panofsky  and  Brier 
(1965,  p.  17*0  or  Air  Weather  Service  Technical  Report  105-38  (lr‘L3). 

Sometimes  it  is  necessary  to  differentiate  between  the  development  mode  arid  the 
forecast  mode.  The  development  mode  uses  a statistical  procedure  to  fit  parameters. 
The  three  stages  — testing,  evaluation,  and  operational  use  — utilize  the  forecast 
mode  of  the  statistical  forecast  procedure.  Both  the  development  mode  ana  forecast 
mode  use  observations  as  input,  but  the  output  of  the  development  mode  is  param- 
eters, and  the  output  of  the  forecast  mode  is  a forecast. 


Objective  and  Subjective  Procedures. 


Any  of  the  stages  or  either  of  the  modes  may  be  done  objectively  or  subjective- 
ly. An  objective  procedure  Is  a set  of  rules  that  will  give  the  same  resr.it  if 
applied  by  different  individuals  or  computer  systems.  A subjective  procedure 
depends  on  the  past  experience  of  the  individual,  and  two  individuals  may  get  dif- 
ferent results.  Certain  computers  can  be  programmed  to  use  subjective  methods; 
that  is,  the  result  at  any  given  time  will  depend  on  the  previous  experience  of  the 
computer.  For  example,  refer  to  Learning,  Machines  (Nilsson,  1965)*  The  term 
objective  forecast  is  sometimes  used  synonymously  with  statistical  forecast.  How- 
ever, a nonstatistical  method,  e.g.,  dynamic,  can  also  be  used  to  produce  objective 
forecasts . 


Any  of  the  stages  or  either  of  the  modes  may  be  done  by  automated  or  manual 
means.  A computer  can  be  used  to  develop  a method  tha*  will  be  used  manually,  or  a 
person  can  manually,  e.g.,  graphically,  develop  a method  that  will  be  used  or.  a 
computer. 

2.3-  Statistical  Terms. 

Two  other  terms,  empirical  and  stochastic,  are  used  ir.  certain  cases  in  place  of 
statistical.  An  empirical  method,  to  paraphrase  Webster's  dictionary  (Friend  and 
Guralnik,  1959)*  is  based  solely  on  experiment  and  observation;: . A statistical 
met  ho  d is  be..---  observations,  but  these  are  handled  according  to  the  science  f 
statistics . A stochastic  method  involves  randomness.  Stochastic  methods  Include 
atlstical  methods,  since  collecting  observations  car.  be  thought  of  as  a ran 
process.  However,  a stochastic  forecast  model  can  be  develops  without  the  use  of 
observations,  e.r:.,  a model  based  on  Brownian  motion  theory. 

Some  statistical  procedures  require  more  data  than  others  to  obtain  the  same 
degree  of  accuracy.  If  a procedure  needs  relatively  few  data  it  Is  said  to  be  an 
.'fie lent  procedure.  Efficient  proce  lures  which  depend  on  various  assumptions  in 
the  statistical  model  are  calle  . parametric , since  the  basic  form  is  assumed  and 
only  parameters,  which  are  limited  in  effect,  can  be  changed.  If  the  basic  assump- 
tions are  correct  or  approximately  correct,  then  the  procedure  will  yield  reliable 
results;  however,  if  the  assumptions  are  poor,  then  a great  deal  of  data  will  not 
make  the  procedure  work  well.  In  contrast  to  parametric  methods,  nonparametrlc 
methods  reauire  few  assumptions  but  need  more  data.  (Refer  to  Kanal  ar.d  'handrase- 
karan,  1971,  for  further  information  on  the  amount  of  data  needed.'  More  informa- 
tion on  statistical  methods  can  be  found  in  Panofsky  and  brier  (1965),  (Jringorten 
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(1955).  ami  Air  Weather  Service  Technical  Report  105-38,  Short  Range  ana  Extenie j 
Forecasting  by  Statistical  Methods  (1948).  Glahn  (1975)  har  < iven  ar,  up-to-  iat1 
account  with  an  excellent  bibliography . 

2.4.  Probability  Terms. 

Although  probability  can  be  defined  several  ways  (Chacko,  1971),  it  is  most 
often  defined  in  meteorology  as  the  relative  frequency  of  an  event.  The  relative 
frequency  concept,  however,  is  difficult  to  extend  to  rare  events  which  have  occur- 
red only  once  or  twice,  or  perhaps  not  at  all.  In  this  situation,  the  more  natural 
viewpoint  is  given  by  subjective  probability  which  is  definej  as  a personal  degree 
of  belief  (Savage,  1954).  Decision  theory  requires  probability  forecasts  to  opti- 
mize use  of  resources  because  probability  forecasts  have  more  utility  than  a cate- 
gorical forecast  of  the  same  skill  (Thompson  and  Brier,  1955)*  The  probability 
notation  Pr(Y)  denotes  the  probability  of  the  event  In  the  case  of  continuous 
variables  the  notation  Pr(Y<^)  denotes  the  probability  that  the  random  variable  Y, 
e.g.,  temperature,  will  be  less  than  the  specified  value  y,  e.g.,  32®P.  The  Joint 
probability  of  the  two  events  X and  Y,  Pr(X,Y),  denotes  the  probability  that  X and 
Y will  both  occur. 

Conditional  probability  is  the  probability  of  an  event  based  on  a given  set  of 
information,  e.g.,  the  probability  of  a thunderstorm  tomorrow  if  a thunderstorm 
occurred  today.  Conditional  probability  implies  that  no  other  information  is  given 
except  the  specified  set.  That  all  information  is  not  given  or  available  is  what 
makes  probability  a useful  concept.  If  we  knew  for  sure  that  a thunderstorm  would 
occur  tomorrow,  we  would  not  need  to  use  probability.  Thus,  all  probability  state- 
ments are,  in  fact,  conditional  probability  statements. 

Conditional  probability  uses  the  notation,  Pr(Y|X),  which  is  read:  the  prob- 
ability of  (the  event)  Y given  that  (the  event)  X has  occurred.  In  forecasting 
terms,  X is  the  predictor  and  Y is  the  predictand.  If  a second  predictor  (2)  is 
added,  Pr(Y|X,Z),  the  conditional  probability  is  different.  Thus,  the  original 
conditional  probability,  (Pr(X|Y),  could  be  read:  the  probability  of  Y given  in- 
formation about  X but  not  given  any  information  about  Z or  any  other  information. 

In  practice,  however,  some  other  information  is  usually  implied  and  not  specifically 
added  in  the  conditional  probability  notation,  e.g.,  the  fact  that  it  is  St.  Louis, 
Missouri  in  July  for  which  the  thunderstorm  probability  is  calculated  may  r.ot  appear 
explicitly  in  the  conditional  probability  notation.  A great  deal  of  confusion  in 
probability  forecasting  results  if  the  implied  information  is  not  specified  before- 
hand. 
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Chapter  3 

OVERVIEW  OF  THE  MODEL 


This  chapter  contains  a qualitative  description  of  how  the  three  procedures  - 
transnormalization,  correlation,  and  regression  probability  — fit  together  to  make 
an  effective  forecast  method.  In  brief,  equations  are  known  for  calculating  con- 
ditional probability  given  the  multivariate  normal  distribution.  Since  many  mete- 
orological variables  are  not  normally  distributed,  they  are  transformed  by  a func- 
tional relationship  into  a normal  distribution.  Thlc  process  is  called  trans- 
normallzatlon.  Correlation  is  then  calculated  between  each  pair  of  trans normalized 
variables.  The  resulting  correlations  make  up  the  correlation  matrix  which  is  in- 
verted to  obtain  the  regression  coefficients.  The  multivariate  normal  conditional 
probability  equations  are  then  used  to  calculate  a probability  forecast.  In  thi6 
chapter  each  procedure  is  described  in  qualitative  manner  and  mathematical  details 
are  presented  in  the  following  three  chapters. 

3.1.  The  Development  Mode. 

The  development  mode  consists  of  three  steps:  transnormallzation,  correlation, 
and  calculation  of  regression  coefficients.  The  dependent  set  of  data  and  the 
transnormalizer  routine  are  input  to  the  development  mode;  the  regression  coeffi- 
cients are  the  output.  Correlation  is  used  in  the  development  mode  but  is  not 
required  in  the  forecast  mode. 

3.1.1.  Transnormalizer  Development. 

First,  a method  must  be  developed  to  transform  the  observed  or  raw  predictor 
into  the  value  of  the  standard  normal  variable  that  has  the  same  cumulative  proba- 
bility as  the  raw  predictor  (equivalent  normal  deviate,  END).  This  transformation 


CUMULATIVE 


Figure  1.  Transnormalization  of  a Variable.  A variable  is  related  by 
its  cumulative  probability  to  an  equivalent  normal  deviate  (END). 
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is  based  on  the  predictor's  climatological  cumulative  distribution.  For  example, 
suppose  a 10,000-ft  or  less  ceiling  occurs  60j6  of  the  tiiw.  A cumulative  probabil- 
ity of  6056  converts  to  an  END  of  0.26  using  the  normal  ogive  (see  Figure  1). 

Tables,  normal  graph  paper,  and  computer  algorithms  are  available  to  convert  from 
the  cumulative  frequency  to  the  END.  Panofsky  and  brier  (196%  p.  ^1)  give  a 
description  of  a graphical  procedure. 

The  term  "transnormallzed"  is  used  to  emphasize  that  this  is  a nonlinear  trans- 
formation. The  result  is  not  Just  a "standard  variable"  where  the  mean  hat  been 
subtracted  and  the  difference  divided  by  the  standard  deviation.  A transnormallzer 
symbol  (see  Figure  2)  shows  that  the  result  depends  not  only  on  the  raw  variable  — 
the  direct  predictor  — but  also  on  marginal  predictors.  A marginal  predictor  may 
be  thought  of  as  a condition  or  constraint  on  the  distribution  of  the  direct  pre- 
dictor; therefore,  it  changes  the  shape  of  the  cumulative  distribution  of  the  direct 
predictor.  Marginal  predictors  do  not  need  to  be  transnormallzed  and,  in  fact, 
usually  do  not  have  a distribution  per  se  since  the  values  of  marginal  predictors 
are  usually  selected  by  man  rather  than  by  nature.  Typical  marginal  predictors  ar<- 
time  of  day,  location,  time  of  year,  and  weather  modification  effects.  Char’1  I 
contains  further  information. 

MARGINAL  PREDICTOR(S) 

e.g.,  TIME  OF  DAY 

0900L 


DIRECT  J'REDICTOR 


e.g.,  CEILING  10,000  FT 


Figure  2.  Symbol  for  a Transnormallzer.  The 
transnormallzer  symbol  shows  pictorially  the  flow 
of  information  and  emphasizes  the  effect  of  the 
marginal  predictors  on  the  transformation. 


3.1.2.  Correlation  Overview. 

The  next  step  in  the  development  moot  consists  of  finding  the  simple  -rria'i: 
between  each  pair  of  transnormallzed  variables  — Including  all  the  predictors  an 
the  predictanci.  If  both  variables  of  the  correlation  pair  are  continuous.  c . .. 
temperature,  then  the  common  correlation  formula,  sometimes  called  Pearson':  Produ; 
Moment  formula, is  used  (Pearson,  1895).  If  one  variable  is  continuous  an  1 Uv  vthe* 
is  dichotomous  (e.g.,  rain,  no  rain),  the  formula  for  bl serial  correlate  use: 

(Pearson,  1909).  If  both  variables  are  dichotomous,  the  tetrachorio  correlatio: 
formula  can  be  used  (Pearson,  1913a).  Chapter  5 presents  correlation  formula- 
Guilford  and  Fruchter  (1973)  give  a good  descriptive  presentation  of  correlatior 

3.1.3.  Regression  Coefficients  Overview. 

The  regression  coefficients  are  calculated  using  the  same  steps  a^  ruinar. 
multiple  linear  regression.  The  matrix  of  simple  Lntercorrelations  betweer  • 

• is  inverted.  The  inverted  matrix  is  multiplied  by  the  row  vector  cor.cis 
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of  correlations  between  each  predictor  and  the  preiictanl.  The  resulting  vector 
contains  the  coefficients  needed  to  calculate  conditional  probability  using  multi- 
variate normal  :i3tributlon.  Chapter  6 presents  the  ietails  and  formulas. 

3.2.  The  Forecast  Hole. 

Given  the  transnorraalizer  routines  anc  the  repression  coefficients,  the  fore- 
cast mode  calculates  the  conditional  probability  for  each  set  >f  predictors. 

.2.1.  Transnormalization  anl  ttn-  M-.-ar.  Predictor. 

In  the  forecast  node,  each  direct  predictor  is  first  t ransnormadlzed  with  the 
marginal  predictors  affecting  the  transnormalization.  The  transno realized  predic- 
tors are  each  multiplied  by  their  respective  regression  coefficients  and  the  results 
summed  to  get  the  "mean"  predictor.  The  term  mean  predictor  is  used  since  the  sum 
just  describe  : appears,  ii.  the  same  position  in  the  conditional  probability  equation 
as  the  ordinary  mean  does  whcr.  a variable  is  standardized. 

3.2. ?.  Conditional  Probability . 

Next,  the  conditional  probability  is  calculate!  usin/  the  multivariate  normal 
distribution.  Tnsight  int.-  this  calculation  can  be  obtained  by  considering  the 
simplest  case  of  one  predictor.  In  this  case  the  bivariate  normal  distribution  is 
used.  The  bivariate  normal  is  a bell-shaped  Joint  distribution  of  two  variables. 
Each  variable  by  itself  shows  a univariate  normal  distribution,  and  since  trar.s- 


tATER 


8t  30 1 

2O0  FT  1000  FT 


figure  3-  Bivariate  Normal  Conditional  Probability.  The 
bivariate  normal  is  e bell-shapei  volume  with  the  maximum 
density  at  tne  means  which  are  equal  to  zero  with  transnor- 
malizei  variables.  Isopleths  of  equal  density  are  ellipses 
(circles  for  zero  correlation)  which  become  more  eccentric 
as  correlation  increases.  Conditional  probability  of  a 
predictant  category  is  given  by  the  joint  probability  (the 
shade  volume)  divided  by  the  probability  of  the  predictor 
category. 


7 


AWS  Technical  Report  75-259 


December  1976 


normalized  variables  are  used,  the  mean  of  each  distribution  is  zero  and  the  stand- 
ard deviation  is  one.  If  the  two  variables  are  correlated,  isopleths  of  equal 
probability  density  are  ellipses  as  shown  in  Figure  3- 


If  we  consider  one  of  the  variables  as  the  predictor  and  the  other  variable  as 
the  predictand,  we  can  calculate  the  probability  that  the  predictand  will  be  within 
a specified  range,  i.e.,  a category,  given  that  a predictor  value  has  occurred 
within  some  specified  range.  For  example,  we  can  calculate  the  probability  that  a 
ceiling  4 hours  from  now  will  be  in  the  category  200  feet  to  1000  feet  given  that 
the  ceiling  now  is  in  the  200-ft  to  1000-ft  category.  This  conditional  probability 
is  given  by  dividing  the  joint  probability  of  the  predictor  category  and  the  pre- 
dictand category  by  the  probability  of  the  predictor  category.  Unfortunately,  the 
joint  probability  cannot  be  calculated  except  by  numerical  means  such  as  numerical 
integration  or  an  infinite  series.  However,  as  the  predictor  category  shrinks  to  a 
specific  predictor  value,  the  conditional  probability  remains  finite,  and  the  limit- 
ing equation  becomes  quite  simple  (Gringorten,  1972): 

C - (Y  - rp)/y  1 - r2  (3.1) 

II 

where  C is  the  END  of  the  conditional  probability 

ii 

Y is  the  END  of  climatological  probability  of  the  predictand 
r is  the  correlation  between  predictor  and  predictand 
p is  the  END  of  the  predictor 

Derivation  of  this  equation  is  given  in  Chapter  6.  When  more  than  one  predictor 
is  used,  the  "mean"  predictor,  which  is  a linear  combination  of  the  original  pre- 
dictors, acts  the  same  as  a single  predictor  and  the  bivariate  normal  equation  can 
be  used. 

3.2.3.  Inverse  Transnormalization. 

The  result  of  the  conditional  probability  equation  is  a transnormalized  vari- 
able. Inverse  transnormalization  is  required  to  produce  a probability.  Inverse 
transnormalization  is  simply  the  integral  of  the  univariate  normal  distribution 
from  minus  infinity  to  the  value  of  the  transnormalized  variable.  An  exact  simple 
formula  for  this  integral  is  not  known.  However,  many  approximations  and  expan- 
sions are  available  (Abramowitz  and  Stegun,  1964)  so  that  on  a computer  it  is  no 
more  difficult  than  evaluating  a logarithm  or  a cosine.  The  same  symbol  is  used 
for  direct  and  inverse  transnormalization,  but  the  inputs  and  outputs  are  reversed 
(Figure  4). 


- PROBABILITY  (F  < f 


1 2 
- 2 x 


dx 


P) 


Figure  4.  The  Inverse  Transnormalizer . The 
inverse  transnormalizer  symbol  shows  pictori- 
ally  the  transformation  of  an  END, shown  as  Ci 
into  a cumulative  probability  C.  Inverse 
transnormalization  as  used  here  ends  with  the 
cumulative  probability;  no  attempt  is  made  to 
convert  the  cumulative  probability  into  the 
equivalent  value  of  the  predictand.  Inverse 
transnormalization  is  simply  the  normal  prob- 
ability integral  function. 
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3.2.4.  Information  Flow  In  the  Forecast  Mode. 

The  flow  of  Information  and  the  operations  performed  on  the  Information  are 
shown  in  Figure  5.  The  final  probability  depends  on  both  direct  and  marginal  pre- 
dictors. One  of  the  most  important  effects  on  the  final  probability  is  the  effect 
the  marginal  predictors  have  on  the  climatological  probability  of  the  predictand. 


Figure  5.  The  TRP  Forecast  Mode  Raw  Predictors  p^,  Pg,  ...  p^ 
are  Transnormalized  into  END  Variables  p-^,  Pg,  ...  p„.  The 
marginal  predictors  affect  the  transnormalizers  by  changing 
the  shape  of  the  cumulative  distributions.  Transnormalized 
variables  p are  multiplied  (dot  product)  by  the  regression 
coefficients  A to  give  the  scalar  M called  the  mean  predictor. 
The  marginal  predictors  also  affect  the  transnormalization  of 

II 

the  predictand.  The  transnormalized  predictand  f is  combined 
with  the  mean  predictor  M and  the  multiple  correlation  coeffi- 
cient R to  produce  the  END  of  conditional  probability  C which 
is  inverse  transnormalized  to  give  the  conditional  probability 
C. 
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Chapter  4 

TRANSNO RMALIZATION 


4.1.  Reasons  for  Transnormalization. 

Transnormallzatlon  is  the  procedure 
equavelent  normal  deviate  (END).  An 
END  is  related  to  the  variable's  cumu- 
lative frequency  of  occurrence  by  use 
of  the  cumulative  normal  distribution. 
The  term  "transnormalized"  is  used  to 
emphasize  that  this  is  a nonlinear 
transformation.  The  result  is  not  just 
a standard  variable  where  the  mean  has 
been  subtracted  and  the  difference 
divided  by  the  standard  deviation. 

One  purpose  of  the  transnormaliza- 
tion process  is  to  insure  that  the 
predictor  is  normally  distributed. 
Another  purpose  is  that  many  non- 
linear effects  are  taken  into  account. 
For  example,  any  continuous  one-to- 
one  relationship  will  become  linear  if 
both  variables  are  transnormalized. 

Of  course,  not  all  nonlinear  relation- 
ships can  be  taken  into  account  (see 
Figure  6). 

In  practice  there  are  several  ways 
to  transnormalize  a variable.  Graphical 
methods  can  be  accomplished  manually. 

Two -stage  methods  include  histogram 
interpolation,  rank  order,  and  frequen- 
cy curve  fitting.  Direct  methods  use  a 
into  a transnormalized  variable. 

4.2.  Graphical  Transnormalization. 


that  changes  an  observed  variable  into  its 

-4/1  r 

i /•  | h 

1 7 

I — 7H 

-4/ 


/i 


Figure  6.  Linearity  Caused  by  Transnor- 
malization. If  x and  y are  related  by 
any  arbitrary  monotonically  increasing 
function,  y = M(x),  and  both  the  vari- 
ables are  transnormalized,  the  resulting 
relation  will  be  linear.  The  proof  is 
simple:  If  events  less  than  X£  occurred 

n times  then  n events  will  also  have 
occurred  less  than  y^  - M(x/  because  x 
and  y are  monotonically  increasing 
functions  of  each  other.  Thus,  x and  y 
will  have  the  same  cumulative  probabil- 
ity for  any  xi  and  its  related  yi}  and 
x and  y will  be  linearly  related. 


formula  to  convert  the  raw  variable  directly 


The  graphical  method  requires  first  that  a series  of  observations  be  classifies 
into  categories  or  classes  specified  by  class  boundaries.  Next,  the  frequencies  in 
each  category  or  class  are  converted  into  a cumulative  distribution  and  graphed  as 
an  ogive.  Panofsky  and  Brier  (ly65)  give  details  and  examples  of  graphing  the 
cumulative  diatributi 

T us<  ;h  igivi  , entei  . i raw  value  of  the  predictor  and  re  a i 

off  .he  predictor's  :umula  Llity.  nal  y,  the  . nulative  prob- 

ability to  an  equivalent  normal  i.viate  (END)  y use  of  normal  ••  raph  papv r or  a 
table  of  normal  cumulative  probabilities.  The  irawbacks  of  the  graphical  met ho  i 
are  that  it  is  a manual  (slow)  method  and  some  ■ rror  i s intr  i y ini  Lai 
on  the  graph. 

4.3.  Two-;: tags  Tranrm  . 

Two-stage  methods  consist  of  an  algorithm  to  convert  the  raw  predictor  to  it: 
cumulative  probability  and  then  a second  algorithm  to  convert  the  cumulative  prob- 
ability to  an  equivalent  normal  icviate.  There  are  two  general  kir.  is  of  algorithms 


to  convert  the  raw  predictor  to 


cumulative  probability : 


which  is  the  automated  equivalent  of  the  graphical  metho:!,  and  frequency 


histogram  lnterpolatl 


curve 


fitting,,  which  fits  parameter,  t a selected  iistributi  'n  or  fit!  the  coefficient: 
of  a series  expansion  cf  a iistribution. 
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4.3.1.  Histogram  Interpolation. 

The  histogram  Interpolation  method  requires  first  that  class  Intervals  be  speci- 
fied. With  frequency  histograms  the  number  of  classes  can  be  optimized.  Fewer, 
larger  classes  mean  more  data  in  each  class,  which  implies  a smaller  error  due  to 
sampling.  Smaller,  more  numerous  classes  have  greater  sampling  error,  but  interpo- 
lation error  is  smaller.  Panofsky  and  Brier  (1965,  p.  3)  give  the  rule  of  thumb 
that  the  number  of  classes  should  equal  five  times  the  common  logarithm  of  the 
number  of  data  values.  However,  the  rule  appears  not  to  apply  for  minimizing  error 
in  a cumulative  distribution  since  the  error  in  this  case  does  not  depend  on  the 
number  of  cases  in  a class  but  rather  on  the  total  number  below  a value.  It  is 
suggested  that  to  keep  the  error  low,  use  as  many  classes  as  possible  while  still 
getting  at  least  one  observation  per  class. 

Once  the  classes  have  been  specified,  the  algorithm  for  using  a linear  histogram 
interpolation  method  is  as  follows: 

First,  accumulate  data  in  the  histogram. 

Given  U = the  upper  bound  of  the  histogram 
L = the  lower  bound 
N = the  number  of  classes 
T = the  total  number  of  observations 

Then  the  class  interval  (C)  is  given  by: 

C = (U  - L)/N  (4.1) 

An  observed  value  (X)  is  placed  in  category  I where  I is  given  by: 

I = (X  - L)/C  + 1.001  (4.2) 

where  I is  truncated  to  an  integer  value. 

If  I < 1,  set  I to  1 

If  I > N,  observation  is  not  classified 

When  an  observed  value  (X)  falls  on  a boundary,  the  value  1.001  in  Equation  (4.2) 
causes  X to  be  placed  in  the  upper  class  rather  than  the  lower  class.  This  effect 
and  the  limits  that  are  imposed  on  the  values  of  I cause  the  cumulative  probability 
of  a category  I (given  by  Equation  (4.3)  below)  to  be  tne  probability  that  the 
variables  in  that  category  are  less  than  or  equal  to  the  upper  boundary  value  of 
the  category.  If  F(l)  is  the  frequency  of  category  I using  the  above  rules,  then 
the  cumulative  probability  or  ogive  (0(1))  of  category  I is  given  by: 

0(1)  = F(l)/T 


0(1)  = [F(I)/T]  + 0(1  - 1),  I = 2,  3,  ...,  N (4.3) 

Note  that  0(N)  will  not  equal  1 if  some  observation  were  greater  than  U.  To  find 
the  cumulative  probability  (P)  equal  to  or  less  than  some  arbitrary  value  (X), 
some  category  (J)  is  selected  first.  Next,  calculate  the  fraction  (F)  between  cate- 
gories: 

F = G - J (4.4) 

where  G = [(X  - L)/C]  +1  (G  is  simply  an  intermediate  result) 
and  J = INT(G)  INT  is  the  integer  function,  e.g.,  INT(2.3)  = 2 
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If  J < 1,  set  J = 1 

If  J > N - 1,  set  J = N - 1 

The  fraction  (F)  between  categories  (Equation  (4.4))  is  used  to  linearly  interpo- 
late P: 


P = (1  - F)O(J)  + F 0 (J  + 1)  (4.5) 

For  values  beyond  the  observed  data,  the  linear  extrapolated  probability  may  be 
below  zero  or  greater  than  one.  To  preclude  this  from  happening,  the  following 
algorithm  is  recommended: 

If  P < 0.7/T,  set  P = 0.7/T 

If  P > 1 - 0.7/T,  set  P = 1 - 0.7/T 

The  term  0.7/T  comes  from  an  extreme  value  theory  using  the  Poisson  distribution. 

Its  derivation  is  in  Appendix  A. 

4.3.2.  Rank  Order  Transnormalization. 

To  rank  order  a set  of  data  means  to  sort  it  putting  the  smallest  (or  most  nega- 
tive) value  first  and  the  largest  value  last.  Once  the  sort  has  been  completed, 

the  cumulative  probability  can  be  directly  assigned  to  each  observation.  If  T is 
the  total  number  of  observations,  then  the  first  (lowest)  observation  has  the  opti- 
mal estimate  of  cumulative  probability  of  1/(T  + 1),  the  second  has  2/(T  + 1),  etc. 
(Panofsky  and  Brier,  1965,  P-  43). 

In  practice,  ties  (two  or  more  observations  with  the  same  value)  often  occur. 

In  this  case  the  straightforward  algorithm  fails  and  instead  the  following  algorithm 
is  recommended: 

Let  T be  the  total  number  of  observations 

S be  the  number  of  observations  with  values  less  than  the  Ith  observation 

H be  the  number  of  observations  with  values  greater  than  the  Ith  observation 

then  P(I)  is  the  mean  cumulative  probability  of  the  Ith  observation: 

P(I)  = (S  + T - H)/2T  (4.6) 

This  algorithm  is  easy  to  program,  and  the  mean  cumulative  probability  gives  better 
results  for  calculating  correlation  than  the  less-than-or-equal-to  cumulative  prob- 
ability. 

The  rank  order  method  gives  accuracy  equal  to  or  better  than  the  histogram 
method.  However,  the  rank  order  method  has  three  drawbacks.  First,  all  of  the 
observations  must  be  stored  since  they  are  used  more  than  once.  The  histogram 
method  requires  only  the  frequency  in  each  class  to  be  stored.  Second,  the  rank 
order  method  requires  a sort  where  the  number  of  operations  (multiplications,  addi- 
tions, etc.)  is  of  the  order  T2,  whereas  the  number  of  operations  in  the  histogram 
and  other  methods  to  be  described  is  of  the  order  T.  The  difference  in  the  number 
of  operations  can  make  the  rank  order  method  expensive  in  computer  time  for  large 
amounts  of  data.  Third,  the  rank  order  method  requires  a search  when  the  cumulative 
probability  of  a new  value  is  to  be  calculated.  Once  the  search  has  found  the 
nearest  stored  observation  greater  than  the  new  value,  an  interpolation  scheme 
similar  to  Equation  (4.5)  can  be  used.  The  subroutine  ATSG  in  the  IBM  System/360 
Scientific  Subroutine  Package  is  efficient  in  doing  the  required  search. 

The  above  drawbacks  limit  the  usefulness  of  the  rank  order  method.  Storage 
requirements  make  it  difficult,  if  not  impossible,  to  program  on  the  small  program- 
mable calculators.  Nevertheless,  it  has  been  used  effectively  for  sets  of  observa- 
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tions  that  number  less  than  100  and  when  using  only  the  dependent  data  since  with 
depen  lent  iata,  END  can  be  assigned  to  each  observation  an  i no  search  or  interpola- 
tion routine  is  necessary. 

4.3.3*  Using,  a Known  Distribution. 

For  certain  meteorological  parameters,  the  algebraic  form  of  the  climatological 
listri button  is  known.  For  example,  the  gamma  distribution  Is  often  use i for  rain- 
fall. In  this  case,  the  parameters  of  the  distribution  are  fitted  or  subjectively 
estimated.  Then  a procedure  must  be  found  to  give  the  cumulative  probability  (F) 
for  any  giver,  value  (x).  This  procedure  is  equivalent  to  integrating  the  frequency 
distribution  f(x): 

x 

F(x)  = f(x)dx  (4.7) 

k) 

_ os 


If  the  indefinite  integral  of  f(x)  is  known,  it  can  be  used.  If  not,  rational 
approximations  are  sometimes  available.  For  example,  Abramowitz  and  Stegun  (1964) 
give  approximations  for  the  gamma  distribution,  the  beta  distribution,  and  others. 

As  a last  resort,  numerical  integration  can  always  be  used. 

4.3.4.  Unknown  Distribution  — The  Gram-Charlier  Approximation. 

In  1905 , Charlier  described  a series  that  could  be  used  to  approximate  a general 
distribution  (Crd,  1972).  Usually,  only  terms  using  just  the  first  four  moments  are 
used.  In  this  case  the  approximations  for  the  frequency  density  (g)  can  be  written: 

g ( x ) = 0(s)  (1  + K^/8  - SK./6  - s2k^/4  + s3K,/6  + s\4/24)  (4.6) 

where  s is  the  standardized  variable: 


s = (x  - M1)(M2)'t  (4.9) 

Mi  and  M2  are  the  first  and  second  central  moments  (Appendix  C),  and  6(s)  is  giver, 
by  the  relation: 

<t>{  z)  = — EXP(-z2/2)  (4.10) 


where  z is  a dummy  variable.  Kg,  K_ , ana  K^  are  cumulants  calculated  from  the 

moments  (Appendix  C).  Equation  (4.8)  can  be  integrated  by  parts  to  yield  the 
cumulative  probability  (G): 


G(x)  = <Z>(x)  (sK^/6  + sK^/8 


s2K^/6  - s3K^/24)  + 4(s) 


(4.11) 


wnere  01s)  is  given  by  the  relation: 

IT 

P 

<t(p)  = EXP(-  4 z2)  dz 

1 


(4.12) 


where  p is  a dummy  variable.  Methods  of  evaluating  Equation  (4.12)  are  giver,  in 
section  4. 3.  Equation  (4.8)  can  give  negative  values,  particularly  for  extreme 
values  of  6.  Thus  Equation  (4.11)  is  not  always  a monotonically  increasing  func- 

Wh  thii  m tho  ' used,  it  should  be  cl  :ked  first  ’or  significant  negative 
values  which  should  be  set  equal  to  zero. 

t.  ; eworth,  in  16/' , -■  veioped  a similar  expansion  (Ord,  1972).  However,  the 
>Char ha:  a smaller  region  in  whicl,  negative  values  can  occur  (barton  and 
■is,  1952). 
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•’..3.5.  Unknown  Dirt  ri  nuti.in  — The  Pearson  iysteii:  of  Curves. 

Karl  Pearson  based  a general  system  of  frequency  curves  on  an  assume!  form  of 
the  derivative  (f)  of  an  arbitrary  frequency  curve  (Ord,  1972): 

f'  = f (x  - a)/b  - cx  - dx2  (4.13) 

where  x is  the  dependent  variable  and  a,  b,  c,  und  d are  parameters. 

When  Equation  (4.13)  Is  integrated,  it  yields  13  algebraic  forms  including  tiie 
normal,  the  gamma,  the  inverte  i gamma,  beta,  inverted  beta,  and  the  exponential 
distribution.  The  shapes  of  the  distributions  include  bell  shape:,  U shaped,  J 
shaped,  and  saw  tooth.  The  distributions  may  be  unbounled,  bounded  at  one  end,  or 
bounded  at  both  ends.  In  order  to  judge  which  curve  should  be  used,  Pearson  use: 
the  following  criterion  (K): 

K = B1(B2  - 3)/4(4B2  - 3B1)(2B2  - 3BX  - 6)  (4.14) 

where 

B1  = M3/M2  > Bg  = VKI  (4. 15) 

The  criterion  can  range  from  minus  infinity  to  plus  infinity.  Various  ranges 
of  the  criterion  indicate  which  curve  should  be  used  (Brooks  and  Carruthers,  1953* 
p.  122).  For  example,  K = 1 indicates  the  inverted  gamma  should  be  used. 

Once  the  curve  has  been  selected,  parameters  of  the  curve  are  fitted  using  the 
first  four  moments.  Some  of  the  parameters  are  given  by  algebraic  formulas;  others 
must  be  fitted  by  iterative  processes. 

Eldarton  (1953)  has  described  in  great  detail  the  selection  of  curves  and  methods 
of  fitting  the  parameters.  Bouver  (1973)  describes  a set  of  computer  programs  to 
select  and  fit  the  Pearson  curves.  Since  each  of  the  13  algebraic  forms  requir* 
a different  fitting  algorithm,  the  procedures  are  somewhat  cumbersome. 

4.3- 1 . Converting  Probability  to  an  END. 

The  first  stage  of  the  two-stage  method  obtains  the  cumulative  probability  for  a 
given  value  of  a predictor.  The  second  stage  converts  this  probability  into  ar 
equivalent  normal  deviate  (END).  In  other  words,  one  must  solve  for  p given  $(p)  in 
Equation  (4.12).  Hastings  (1955)  has  developed  rational  approximations  to  solve 
for  p.  These  approximations  are  also  given  by  Abra.mowitz  and  Stegun  (1964,  Equa- 
tions 26.  2. 22  and  26.2.23).  One  of  the  rational  approximations  is: 


p = t - a + bt/1  + ct  + dt  , t = J If  (l/p“)  (4.16) 

a = 2.30753,  b = 0.27061,  c = 0. 99229,  d = 0.04481 

The  absolute  error  is  less  than  0.003.  Another  algorithm  which  is  handy  for  the 

small  programmable  calculators  (Joiner  and  Rosenblatt,  1971)  is: 

P = 4.91  [p'i;‘  - (1  - p)‘lZt]  (4.171 

For  0.001  < p < 0.999  absolute  error  is  less,  than  0.05. 

4.4.  Pi  root  Motho  i.y . 


Direct  methods  use  a function  to  directly  convert  the  raw  predictor  value  (xl 
into  its  END  (x ) . Neither  the  probability  density  nor  the  cumulative  probability 
is  calculated  as  ar.  intermediate  result.  If  for  some  reason,  e.g.,  to  check  the 
goodness  of  fit,  probability  is.  needed,  the  END  must  be  transformed  as  shown  in 
section  4.5  — inverse  transnormalization. 
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4.4.1.  The  Cornlsh-Flsher  Expansion. 

Cornish  and  Fisher  (1937)  developed  an  Infinite  series  expansion  based  on  cumu- 
lants  to  directly  transnormalize  a raw  predictor.  Usingiust  the  first  four  terms 
of  the  series  gives  some  desirable  smoothing.  The  END  (x)  is  given  by: 


x = Aq  + Ais  + h?sc  + A3s3 


(4.18) 


where  AQ  = l^Mg-3^2^ 

A1  = 1 - (7/36)  K2Mg'3  - K4M2"2 

Ag  = - K3Mg‘3/2/6 

A3  = K3Mg‘3/6  - K4Mg'2/24 

s is  the  standardized  variable  s = (x  - M^)/./R^ 

i 

and  ^ are  the  first  and  second  central  moments 
K3  and  are  cumulants  (see  Appendix  C) 

4.4.2.  Transnormalized  Quantile  Polynomial  Fitting 

If  the  data  has  been  grouped  into  classes,  the  cumulative  probability  allows 
these  classes  to  be  designated  by  their  quantiles.  (Quantiles  is  a more  general  name 
for  percentiles.)  Let  the  upper  boundary  of  the  ith  category  be  designated  by  x,. 
Also,  let  be  the  transnormalized  cumulative  probability  (quantile)  of  the  ith1 
category  which  can  be  obtained  from  Equation  4.16.  Then,  using  the  two  series  (x^ 
and  x^,  i = 1,  ...  N,  where  N is  the  number  of  classes),  it  is  possible  to  use  poly- 
nomial regression  to  get  x as  a function  of  x.  Standard  polynomial  regression  rou- 
tines can  be  used  such  as  documented  in  the  BMD05R  Polynomial  Regression  program 
(Dixon,  1973).  Thus,  polynomial  equations  of  the  following  form  can  be  generated: 


x = a + bx 


(4.19) 


x = a + bx  + cx 


(4.20) 


+ bx  + cx^  + dx3 


(4.21) 


Where  a,  b,  c,  and  a are  calculated  by  polynomial  regression.  Note  that  x is  the 
raw  variable  not  the  standard  variable . 

Equation  (4.19)  is  simply  a normal  distribution  with  mean  - a/b  and  standard 
deviation  1/b  which  can  be  seen  by  equating  the  standard  form  of  a normal  equation 
with  mean  m and  standard  deviation  s to  the  linear  equation  form: 

x = (x  - m)/s  = a + bx 

Nevertheless,  it  may  be  necessary  to  calculate  the  simple  normal  equation  by  regres- 
sion if  the  data  has  been  previously  categorized  in  such  a fashion  that  it  is  diffi- 
cult to  calculate  the  mean  and  standard  deviation  by  the  usual  method  of  moments. 

The  Equations  (4.20)  and  (4.21)  allow  many  types  of  distributions  to  be  fitted. 
However,  it  is  possible  for  a negative  frequency  to  be  calculated.  This  will  happen 
when  the  equations  have  a negative  slope  implying  that  the  cumulative  frequency  is 
decreasing.  This  could  only  occur  with  a negative  frequency  (see  Figure  7)-  The 
slope  (x')  of  Equation  (4.20)  is: 
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x'  = b + 2cx  (4.22) 

The  slope  is  positive  (usable  range  of 
x)  when: 


r\  v 


-b/2c  < x and  c > 0 
or 

-b/2c  > x and  c < 0 
The  slope  of  Equation  (4.21)  is: 


Figure  7.  Usable  Regions  (dark  lines) 
of  Transnormalized  Polynomials.  Unusa- 
ble regions  give  negative  probabilities. 


b + 2cx  + 3dx^ 


(*.23) 


The  slope  is  positive  (usable  range  of  X)  when: 


-<*>  < x < 


- Jc2  - 3bd 


3d 


and 


- c + 


- 3bd 


< x < ® 


3d 


when 

and 


d > 0 


- c - • c 


-^M<x  < 


3d 


r i 

c + • c - 3bd 
3d 


when 


d < 0 


There  are  several  alternatives  if  the  observed  range  of  the  variable  includes  a 
negative  frequency:  (a)  the  method  can  be  discarded  and  a different  method  used, 
(b)  the  negative  frequency  set  to  zero  when  it  occurs,  or  (c)  if  the  negative  value 
is  small  in  magnitude,  it  can  be  used  in  the  case  of  a predictor  with  the  resultant 
small  error  in  conditional  probability. 

4.4.3.  Johnson's  Family  of  Curves. 

Johnson  (19*9),  to  overcome  the  possibility  of  negative  frequency,  advised  the 
use  of  monotonically  increasing  functions  rather  than  polynomials  for  direct  trans- 
normalizations . In  particular,  he  found  that  functions  of  the  form: 

x = a + b (*  ~ (4.24) 

x = a + b Cr  (x  + k)  (**25) 

x = a + b Slnh”1  (gx  + h)  (4.26) 


are  quite  versatile;  they  can  fit  any  of  the  Pearson  family  of  curves. 

Equation  (4.24)  is  most  useful  for  fitting  distributions  that  are  bounded  at 
both  end  points,  e.g.,  eighths  of  sky  cover  or  relative  humidity.  In  the  bounded 
distribution,  L is  the  lower  bound  and  U is  the  upper  bound.  Numerically,  it  is 
wise  to  make  L slightly  lower  and  U slightly  higher  to  keep  the  argument  of  the 
logarithm  within  limits. 

Equation  (4.25)  gives  the  well-known  lognormal  distribution.  It  is  particular- 
ly useful  for  distributions  that  are  bounded  at  one  end,  e.g.,  ceiling,  visibility, 
of  rainfall.  Often  -k  is  equated  to  the  lower  bound  or  is  given  a slightly  lower 
value . 
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Equation  (4.26)  can  be  used  for  unbounded  distributions,  e.g.,  vertical  notion 
or  pressure. 

If  U and  1.,  k,  or  g and  h are  known,  it  Is  easy  to  calculate  a and  b by  means 
of  linear  regression  or  transno mail zed  quantiles  as  lescribe  i in  section  4.4.2. 
However,  if  bounds  are  not  known,  U and  L,  k,  or  g and  h require  nonlinear  fitting. 
Johnson  (1949)  and  Ord  (1972)  give  several  ways  of  fitting  these  variables.  The 
following  methods  were  levclcped  by  the  author  using  the  Taylor  series  expansion 
for  the  logarithm  and  hyperbolic  sine  functions  (see  Appendix  c) . To  fit  the  log- 
normal  Equation  (4.25),  a third-order  polynomial  regression  is  applied  to  the 
t rans normalized  quantiles  (see  Equation  (4.21)).  Thus,  k is  fitted: 

k = - 2/3  c/d  (4.27) 

if  x - k < 0,  k is  set  to  the  minimum  value  of  x,  c and  d are  the  coefficients 
derived  for  Equation  (4.21).  The  hyperbolic  sine  coefficients,  g and  h,  can  be 
fitted  using 


h = I , g = 3hd/c  (4.28) 

• c - 3db 

where  b,  c,  and  d are  coefficients  derived  for  Equation  (4.21).  If  the  term  in  the 
radical  is  negative,  make  it  positive  before  attempting  t-  take  the  square  root  and 
then  set  h negative. 

4.5.  A Transno rrnalizatlon  Example. 

The  above  methods  were  used  on  a variety  of  meteorological  parameters  — temper- 
ature. relative  humidity,  win  1 speed,  eighths  of  sky  cove  •,  10CC-500  mb  thickness, 
etc.  — to  learn  how  well  the  transnormalization  procedures  fit  observed  distribu- 
tion::. Figure  8 shows  a 'ypical  example  incluiing  various  fits  to  the  Pittsburgh 
1900  L3T  surface  wind  speed  for  October  and  November  for  the  years  1961  through  1965. 
Statistical  parameters  for  the  distribution  are  shown  in  Table  1. 


Table  1.  Statistical  Parameters  for  Pittsburgh  Wind  Speed. 


Moments  7.15 

18.4 

66.0 

1626. 

Cumulants 

18.3 

66 . 0 

. 

Skewness  = .839  Kurtosis  = 1. 

86 

Number 

of  cases  = 

285 

Pearson's  S1  = .696  Dg 

= 4.81 

Max 

Constant 

1st 

2nd 

3rd 

RMSE 

Error 

Cornish-Flsher  Coefficients  G.14C 

1.10 

-0.140 

0.040 

0.027 

0.079 

lst-Order  Regression  -1.25 

0.187 

-- 

-- 

0.029 

0.1.  - 

2nd-0rder  Regression  -1.70 

0.305 

-0.00490 

-- 

0.026 

0.077 

3rd-0rder  Regression  -1.59 

0.243 

d . 00167 

-0.000183 

0.025 

0.084 

Usaole  range  of  2nd-order  regression  fit  is 

from  minu 

s Infinity 

to  31.1. 

Usable  range  of  Cubic  fit  is  -5^.7 

to  73.0. 

Max 

A 

B 

C 

D 

RMSE 

Error 

x = ( (x-c )/ ( D-x) ) 1.14 

0.841 

-0.500 

25.5 

0.044 

0.114 

x = A+B*Sinh~^ ( c*x+d ) -O.765 

-3.36 

-0.0664 

0.203 

0.027 

0.092 

x = A+B*f ( x+c ) -7.38 

2.99 

6.10 

-- 

0.029 

0.067 

Gram-Charlj  er  0 0 

■'f(x)  = Z (Z+BS+Cs  tDs3 )+4  ( s ) 0.419 

0.233 

-0.140 

-0.078 

0.026 

0.060 

FRACTION  PER  KNOT 
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Figure  8.  Transnormalized  Distribution  Approximations  to  the  Pittsburgh  Surface  Wind 
Speed.  Approximation  parameters  are  given  in  Table  1.  The  2na-order  polynomial 
graph  is  nearly  identical  to  the  3rd-oraer  polynomial  graph.  Some  of  the  approxima- 
tions are  poor  due  to  recording  errors  of  the  anemometer  and  observer. 


Some  anomalies  in  the  winn  speed  distribution  are  typical  of  observed  meteoro- 
logical distributions  in  general.  Because  the  anemometer  does  not  respond  to  a wind 
less  than  3 knots,  speed  of  1 and  2 knots  are  not  recorded  causing  a subsequent 
increase  in  the  number  of  calm  winds.  Observer  preference  for  numbers  divisible  by 
five  shows  up  in  the  high  observed  frequency  of  5-  and  10-knot  winds.  Surprisingly, 
the  15-knot  speed  does  not  show  a high  observed  frequency. 

The  root  mean  square  errors  (KMSE)  and  maximum  errors  of  various  transnormaliz- 
ing  procedures  are  given  in  Table  1.  The  term  error  may  be  a misnomer  since  some 
of  the  fittei  curves  may  be  closer  to  the  actual  distribution  than  the  observed 
requency  which  includes  instrument  and  observer  bias. 

4.6.  Selecting  a Transnormalization  Procedure. 

None  of  the  transnormalization  procedures  work  perfectly  for  all  situations. 
Furthermore,  since  the  data  i'self  contain  measurement  errors  and  sampling  errors, 
a good  fit  on  one  set  of  data  is  not  a guarantee  that  the  transnormalizer  will  do 
well  on  more  data  of  the  same  kind.  Selection  is,  therefore,  something  of  an  art. 
The  following  strategy  is  tentatively  suggested: 

a.  If  the  variable  has  been  known  to  fit  a given  distribution,  try  using  the 
distribution  in  a two-stage  method  (section  4.3.3^. 

c.  If  the  iata  have  been  already  classified . use  histogram  interpolation  (sec- 
tion 4.3.1)  or  polynomial  regression  (section  4.4.2).  If  the  classes  are  unequally 
spaced,  polynomial  regression  is  the  better  choice. 

c.  If  the  data  are  continuous  with  an  unknown  distribution,  the  Cornish-Fisher 
expansion  will  generally  fit  the  data  well  (section  4.4.1). 


19 


AWS  Technical  Report  75-259 


December  1976 


4.7.  Procedures  for  Inverse  Transnormalization. 

The  regression  probability  method  requires  that  an  END  be  transformed  back  to  a 
cumulative  probability,  which  is  the  normal  probability  integral  as  explained  in 
section  3.2.3-  Hastings  (1955)  developed  the  following  rational  approximation: 

e > 0:  ‘I'(e)  = 1 - i(l  + c^e  + c2e2  + c^e3  + c^e^)_iJ 

e < 0:  ^(e)  = £(1  - c^e  + c^e2  - c^e3  + (4.29) 

where  e is  an  END. 

P is  the  cumulative  probability 

c1  = 0.196854  c2  = 0.115194  c3  = 0.000344  c^  = 0.019527 

The  absolute  error  is  less  than  0.00025,  which  represents  more  than  enough  accuracy 
for  probability  forecasting. 

Muench  developed  the  following  approximation  which  is  very  handy  for  calculators 
(Touart,  1973): 


P(e ) = [1  + Tanh  (0.8e  + 0.035  e3)]/2 


(4.30) 


Error  is  less  than  0.0004  in  magnitude. 

4.8.  Transnormalization  and  Marginal  Predictors. 

Most  of  the  transnormalizing  methods  discussed  the  use  of  three  or  four  param- 
eters to  specify  the  transformation.  In  general,  the  value  of  these  parameters  will 
vary  depending  on  the  time  of  day  or  season  of  the  year;  thus,  the  parameters  become 
functions  of  the  marginal  predictors.  An  example  of  this  can  be  seen  in  the  visi- 
bility transnormalization  method  used  at  Cambridge  Research  Laboratories  (now  Air 
Force  Geophysics  Laboratories ) (Chisholm,  et  al.,  1974).  In  this  case  the  END  of 
visibility  (v)  is  given  by  a lognormal  transformation: 

v = L + K 0n(v)  (4.31) 

where  v is  visibility  and  L and  K are  functions  of  time  of  day: 

K = A • H + B,  L = C • H + D (4.32) 

where  H is  the  time  before  or  after  sunrise,  and  A,  B,  C,  D are  climatologically 
determined  constants  with  one  set  for  winter  and  one  for  summer.  The  time  of  cay, 
as  measured  by  H,  will  have  an  important  effect  on  the  value  of  v.  Climatological 
probability  for  any  value  of  visibility  for  any  time  of  day  is  stored  in  only  four 
numbers'. 

The  final  point  to  remember  is  that  the  transnormalization  parameters,  includ- 
ing the  marginal  predictor  transformations,  are  based  on  the  relatively  long  clima- 
tological data  base  rather  than  a specific  development  data  base  of  paired  predictor 
and  predictand  observations. 
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Chapter  5 


CORRELATION 


Correlation  Is  required  to  calculate  the  regression  probability  coefficients. 

A specific  type  of  correlation,  found  in  the  multivariate  normal  equation,  must  be 
used.  In  the  bivariate  normal  equation,  correlation  (p)  appears  in  three  places 
(Anderson,  1958): 


EXP  (-  £(*  - 2exy  - y2)/(l  - o2)) 


(5.1) 


where  f is  the  probability  density,  and  p is  correlation. 

If  Equation  (5-1)  is  used  as  the  basic  model,  how  can  an  estimate  of  n be  cal- 
culated if  x or  y has  been  distorted  or  categorized?  Distortion  is  handled  effec- 
tively by  transnormalization.  To  handle  continuous  or  categorized  variables  proper- 
ly, several  methods  have  been  developed  to  calculate  correlation. 

5.1.  The  Pearson  Product  — Moment  Formula. 

If  Xi  and  y<,  i = 1,  ...,  N,  are  a series  of  paired  observations  and  x and  y 
are  each  normally  distributed  (i.e.,  have  been  transnormalized) , an  estimate  of  the 
correlation  (r)  can  be  calculated. 


n i (xi  - x)(yj  - y) 


where  the  summation  is  i 


s s 
x y 


, , N and  x and  y are  means: 


x = I \ x 
x N L i 


(5-3) 


sx  and  Sy  are  standard  deviations: 

sx  = S L (xi  - » sy  = [II  (yi  - y>? 

An  equivalent  and  easier  to  compute  formula  is* 

Nd  x±yi)  - (I  \)  (I  yt) 

r' a ^ [«&»?) <5'41 

If  x and  y are  large  numbers  with  means  not  equal  to  zero.  Equation  (5.4)  can  have 
considerable  round-off  error,  but  with  transnormalized  variables,  this  error  is 
negligible.  If  x and  y have  been  transnormalized,  their  means  should  equal  zero 
and  standard  deviation  equals  one.  Thus,  Equation  (5.2)  could  be  written: 


1 V it  ii 

r = N 1 Xi  yi 


However,  experience  has  shown  that  because  of  sampling  error  (the  mean  of  a given 
sample  will  generally  not  equal  zero)  and  transnormalization  error,  Equation  (5.5) 
will  often  give  misleading  results  (e.g.,  r = 1.2)! 

The  standard  error  (E  ) of  r when  r is  calculated  by  Equation  (5*4)  is 
(Elderton,  1953,  p.  193) :PP 
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Eppm  = (x  " r2)/vto  - 1 (approximation)  (5*6) 

5.2.  Blserlal  Correlation. 

Suppose  a normally  distributed  variable  is  divided  into  two  categories.  If  the 
variable  is  below  some  critical  value  it  is  placed  in  category  1,  otherwise  it  is 
put  into  category  2:  for  example,  temperature  above  or  below  freezing.  If  the 
above-  or  below-freezing  category  is  given  for  one  location  and  the  actual  tempera- 
ture is  given  at  another  location,  is  there  a way  to  calculate  correlation  between 
the  temperatures  at  the  two  locations?  The  biserial  correlation  formula  (Pearson, 
1909)  is  the  answer: 

r = IL-JIJ  £ ° (5.7) 

s y 1 1 

where  "K  is  the  mean  value  of  the  continuous  variable  for  those  cases  that  are  ir. 
the  category  above  the  critical  value 

5 is  the  mean  value  of  the  continuous  variable  for  the  cases  below  the  criti- 


cal 

value 

p 

is 

the 

fraction 

of 

cases  in  the 

upper  category 

q 

is 

the 

fraction 

of 

cases  in  the 

lower  caregory  (i.e.. 

q ■ 1 - p) 

s 

is 

the 

standard 

deviation  of  the 

continuous  variable 

y 

is 

the 

ordinate 

of 

the  normal  curve  at  the  equivalent 

critical  value 

y = EXP  (-  £ q2)  (5.8) 

and  a is  the  END  for  the  fraction  q. 

q can  be  calculated  by  Equations  (4.16)  or  (4.17). 


Guilford  (1973)  gives  an  equivalent  formula  that  is  slightly  easier  to  compute 
since  only  the  mean  of  one  category  needs  to  be  compiled.  The  required  overall  mean 
of  the  continuous  variable  is  a by-product  of  calculating  the  standard  deviation. 

The  alternate  formula  is: 


r = 


(5.9) 


where  A,  p,  y,  an:  s are  defined  ir.  Equations  (5-7)  anc  (5.8);  5 is  the  overall  mean 
of  the  continuous  variable.  The  standard  error  (EK)  of  the  biserial  correlation 
coefficient  is  (Guilford,  1956j  Equation  14.8): 


E 


b 


(5.10) 


where  p,  q,  y,  and  r are  defined  in  Equations  (5.7),  (5-8),  and  (5.9);  N is  th« 
number  of  cases. 


Equations  (5-9)  and  (5-10)  are  based  on  a normal  (or  transnormalized)  variable 
which  has  been  categorized  into  two  parts.  If  a variable  is  categorized  by  nature, 
e.g.,  snow  versus  rain,  wc  assume  that  there  is  a continuous  anc.  normal  underlying 
variable.  In  the  case  of  rain  versu;  snow,  this  variable  could  be  a type  of  mear. 
temperature  in  the  lowest  layers  of  the  atmosphere.  The  underlying  variable  for 
many  meteorological  categorized  variables  is  obvious  but,  whether  there  i.  ar. 
underlying  variable  is  not  the  key  point.  The  key  point  is:  if  ar.  underlying  vari- 
able is  postulated,  how  good  will  the  probability  forecast  be?  Since  a formal 
answer  is  not  possible  the  approach  must  be  philosophical.  Assume  an  underlyin'- 
distribution,  then  calculate  a correlation.  This  calculation  is  then  use  to  calcu- 
late repression  probability.  The  consistent  use  of  this  assumption  throughout  th< 
calculations  avoids  some  of  the  potential  error.  The  verifications  done  by  McCaoe 
(1968),  Gringorten  (1971),  Boehm  (1973)!  and  Martin,  Hull,  and  Chin  ( 1 9'7 3 ' show  the- 
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the  assumption  work::  well  in  the  cases  tested. 
5*3.  Tetrachorlc  Correlation. 

If  both  variables  are  categorised  into 
two  categories,  the  result  is  a fourfold 
or  tetrachoric  table  (as  seen  at  the 
right),  where  A,  B,  C,  and  D are  the  num- 
ber of  occurrences  above  or  below  the 
critical  values  of  the  respective  varia- 
bles. 

Unfortunately,  there  is  not  a simple 
exact  formula  for  calculating  correla- 
tions for  a fourfold  table.  The  exact 
expression  containing  r is  (Elderton, 

1953,  P-  177): 


VARIABLE  1 
(CRITICAL  VALUE) 


BELOW 


A30VE 


W BELOW 


K ABOVE 
> 


where 


*K(x) 


0K(X) 


A/N  = \ T1(x)  r1(y ) r1 

i=0 

A is  the  value  from  the  tetrachoric  table 
N is  the  total  cases,  N=A+B+C+D 
x is  the  END  of  (A  + B)/N 
y is  the  END  pf  (A  + C)/N 
T is  the  ith  tetrachoric  function 

v*)  = l-.uVV-lLxl 

is  the  Kth  derivative  of  the  normal  density  function: 


0K(x)  = -i— r -i-  EXP  (-1  x?) 
dx‘ 


(5-11) 


(5-12) 


(5.13) 


is  easily  calculated  by  the  recurrence  equaci ~n  (Abramowitz  and  Stegun,  lgtL): 


0K(x)  = - x0K-i(x)  - (K  - l)*K-2(x) 
for  K > 1 and  with  starting  values  for  K = 1 and  K = 0: 

0°(x)  = — EXP  (-£  x2) 

J2* 


0i(x)  = -X0°(x) 


also. 


-.-l/"*  = A-Hl  ^-1,"^  = A+c  7 Superscripts  here  are! 

' ' n * ' n Linlex  not  inverse 


(5.14) 


(5-15) 


(5.16) 


(5.16) 


Abramowitz  and  Stegun  give  the  following  series  expansion  which  can  be  derived  from 
Equations  (5.11)  and  (5 .12): 
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*/»  , I*  * »)4»-+c)  + \_ 

N i=0 


(5.18) 


However,  0i(x)  alternates  in  sign  and  can  grow  beyond  computer  limits  if  x > 1.  To 
solve  this  problem,  the  factoral  and  power  of  r were  incorporated  into  the  differ- 
ence Equation  (5.14): 


A/N  = XAJL^^A^Cl  + ^ R1(x)S1(y) 


(5-19) 


where  Ri  = -xR1_1/(i  - 1)  - (i  - l)R1_2/i(i  - 1) 


Ro=0°(x),  R1  = 


-x*°(S)/2 


(5-20) 


and  = -ySi_1r  - (i  - l)Si_2r? 

So=r0°(y),  Sx  = -yr20°(y) 

If  the  infinite  series  of  Equation  (5.19)  is  truncated  after  two  successive  terms  of 
the  series  are  smaller  in  magnitude  than  the  desired  error,  possible  divergence  is 
prevented  and  the  calculations  will  remain  within  computer-  limits.  To  solve  for  r, 
a method  for  finding  the  zeros  of  a polynomial  must  be  used.  Elderton  (1953)  rec- 
ommends Newton's  method.  The  Scientific  Subroutine  Package  has  a subroutine  TETRA 
which  uses  r to  the  seventh  power  and  uses  a Newton-Raphson  method  to  find  thei(root. 
However,  seven  terms  are  not  sufficient  to  give  adequate  accuracy  for  a large  x 
or  y. 

The  number  of  terms  must  be  specified  in  advance  when  using  any  polynomial  zero- 
finding method.  To  overcome  this  drawback,  an  algorithm  based  on  the  false  position 
method  is  recommended.  Evaluate  the  polynomial  at  two  initial  guess  values,  and 
then  use  linear  extrapolation  to  find  a better  estimate. 

Let  U(r)  = A/N  as  calculated  by  Equation  (5.19). 

Convenient  first  guesses  are: 


r±  = 0,  U(rx)  = (A  + B) (A  + C)/N£ 


r2  = Sin 


TT  Jad  - M 

— 

Tad  + vBS 


U(r2)  is  found  using  Equation  (5.19). 
The  improved  estimate  of  r is  found: 


ri  = ri-l  - (ri-l  - ri-2) 


U(1_1)A/N 


(5.21) 


(5.22) 


(5-23) 


The  process  is  stopped  when: 
lU^)  - A/N  | < e 
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where  c Is  much  smaller  than  the  acceptable  error,  e.g.,  r = 1 x lo”  . 

Equation  (5.22)  is  sometimes  specified  as  the  equation  for  tetrachoric  correla- 
tion; however,  it  is  only  an  approximation.  It  is  accurate  when  (A  + B)/N  = 0.5 
and  (A  + C)/N  = 0.5,  but  for  values  near  one  or  zero  for  (A  + B)/N  and  (A  + C)/N, 
Equation  (5.22)  contains  sizable  error  (Castellan,  1966). 

The  standard  error  (E. ) of  the  tetrachoric  correlation  was  first  derived  by 
Pearson  (1913a):  1 

E = — [(a  + d) (c  + b)/4  + H2  (a  + c)(d  + b)  + G2  (a  + b)(d  + c) 

+ 2GH  (ad  - be)  - H (ab  - cd)  - G (ac  - bd)]"*  (5-24) 

where  a = A/N,  b = B/N,  c = C/N,  d = D/N;  A,  B,  C,  D are  the  values  in  the  fourfold 

table  which  has  been  arranged  so  that  A + C > B + C and  A + B > C + D;  and 

G = P [(x  - ry )/,/l  - r2] 

H = P [ (y  - rx)/./l  - r2] 

P is  the  cumulative  normal  function  defined  by  Equation  (4.12)  and  calculated 
by  Equations  (4.29)  or  Equation  (4.30) 

x is  the  END  of  (A  + B)/N 

y is  the  END  of  (A  + C)/N 

5.4.  Other  Methods  of  Calculating  Correlation. 

There  are  many  ways  to  calculate  correlation  beyond  the  three  methods  described 
above.  A few  of  the  more  prominent  ones  are  mentioned  for  illustration. 

Kendall  (1948)  gives  a generalized  equation  for  correlation.  Spearman's  rank 
correlation,  Kendall's  rank  correlation,  as  well  as  the  Pearson  product  moment 
formula  can  be  generated  from  the  general  equation.  The  rank  correlation  method 
can  give  estimates  of  the  bivariate  normal  correlation  (Kendall,  1948,  Equation  9.2). 
However,  they  suffer  from  the  same  drawback  as  the  rank  transnormalization  method 
in  that  they  require  a sort  which  is  expensive  in  computer  time  for  large  sets  of 
data. 

The  coefficient  of  contingency  combined  with  a correction  factor  (Elderton, 

1953)  can  be  used  in  the  case  where  both  variables  are  split  into  many  categories. 
Herring  (Touart,  1973)  used  a minimum  chi-square  fit  to  calculate  correlation  in  a 
contingency  table.  The  disadvantage  of  these  two  methods  is  that  they  weigh  each 
cell  in  the  contingency  table  equally.  If,  as  is  often  the  case  in  meteorology, 
one  or  several  of  the  cells  have  only  a few  cases,  the  standard  error  of  the  method 
will  be  large. 

5.5.  Selecting  a Correlation  Procedure. 

If  both  variables  are  continuous  and  have  been  transnormalized,  the  Pearson 
product  moment  formula  should  be  used  since  it  has  a smaller  standard  error  than 
the  tetrachoric  or  biserial  methods. 

Many  continuous  variables,  however,  are  categorized  by  the  measuring  instruments 
or  by  the  method  of  encoding  the  data.  For  example,  sky  cover  is  reported  in  eighths 
or  tenths  rather  than  a continuous  variable  from  zero  to  one.  Pearson  (1913b)  cal- 
culated the  error  for  various  numbers  of  categories.  For  five  categories  or  less, 
there  is  more  than  a 5%  error  in  the  calculated  correlation.  Therefore,  if  there 
are  five  or  less  categories,  the  biserial  or  tetrachoric  procedure  is  recommended  . 
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These  categories  should  be  divided  into 
number  of  cases. 

For  6 to  14  equal-size  categories, 
the  product  moment  method  can  be  used, 
using  the  mid-points  of  the  classes  and 
the  corrections  Cx  and  Cy,  which  depend 
on  the  number  of  categories  for  each 
variable.  The  information  in  the  table 
on  the  right  is  taken  from  Pearson 
(1913b).  The  corrected  value  of  cor- 
relation r'  is: 

r 1 = 


two  groups  so  that  each  has  about  the  same 
Number  of  Classes  Lx  0r  Cy 


6 

.960 

8 

.977 

10 

.985 

12 

.989 

14 

• 992 

r/CxCy  (5.25) 


For  15  or  more  classes,  the  Pearson  product  moment  formula  could  be  used  directly 
with  little  error. 

Many  meteorological  variables  are  truncated  beyond  a certain  value.  For  example, 
visibility  is  reported  only  up  to  6 miles  in  the  aviation  METAR  code.  All  values 
greater  than  six  are  reported  only  as  "six  plus."  In  this  case,  the  biserial  or 
tetrachoric  formula  will  give  a better  estimate  of  correlation  than  the  product 
moment  formula  because  all  the  values  of  "six  plus"  will  cause  a spike  in  the  dis- 
tribution. The  difference  between  the  tetrachoric  and  product  moment  correlation 
estimate  can  be  large.  Tetrachoric  correlation  is  easier  to  use  than  the  Pearson 
product  moment  correlation  when  handling  large  data  samples  which  exceed  available 
computer  memory. 

The  product  moment  formula  requires  two  passes  through  the  data,  once  for  trans- 
normalizing  and  once  for  calculating  correlation  using  the  trans normalize  a varia- 
bles. The  tetrachoric  formula  requires  only  one  pass  since  the  four  required  values 
can  be  accumulated  during  the  transnormalization  data  pass.  The  higher  standard 
error  of  the  tetrachoric  method  is  not  a significant  factor  with  large  data  samples. 
There  is,  however,  one  disadvantage.  The  boundary  dividing  the  two  classes  must  be 
specified  in  advance  when  using  the  tetrachoric  method. 
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Chapter  6 

REGRESSION  PROBABILITY 


Regression  probability  is  based  on  the  multivariate  normal  distribution.  De- 
tails of  the  multivariate  normal  distribution  can  be  found  in  texts  by  Anderson 
(1958),  Miller  (1964),  Hogg  and  Craig  (1965),  and  many  others.  Analysis  of  the 
multivariate  normal  is  such  a broad  field  that  only  the  regression  probability 
equation  will  be  derived.  As  much  as  possible,  Anderson's  notation  is  used. 

The  following  derivation  of  the  regression  probability  equation  first  defines 
general  multivar'ate  normal  terminology.  The  only  assumption  made  in  the  derivation 
of  the  TRP  model  is:  Given  that  the  predictand  and  the  predictor  variables  have 
been  individually  transnormalized,  they  are  distributed  joint  normally. 

6.1.  The  Multivariate  Normal. 

Let  X be  a matrix  of  observations: 


;11 

x12  • ' 

X1M 

;21 

x22  ‘ ' 

' ’ X2M 

N1 

XN2  • ' 

' ' XNM 

where  the  first  subscript  indicates  to  which  observation  the  set  of  measurements 
belongs,  e.g..  May  3>  May  4,  May  5,  etc.  The  second  subscript  indicates  which  ele- 
ment is  being  measured,  e.g.,  temperature,  wind  speed,  pressure,  etc. 

If  Z is  the  matrix  of  deviation  from  the  mean,  Zji  = Xj_ j - then  the  covari- 
ance maTrix  _T  is  defined  by  (superscript  T indicates  transpose): 


£ = 


T 1 

— T—  N 


CT11  °12 
°M1  °M2 


alM 

°MM_ 


(6.2) 


Any  individual  element  of  V is  given  by: 

N 

°ij  = N id  Zik  Zjk  = ZiZj 
k=l 


(6.3) 


which  means  the  matrix  P is  symmetric:  a.  = a... 

J «J  1 

Let  X be  divided  into  two  sections: 


^ll1 

x12 

X13  ’ 

• xim" 

x(D  = 

X21 

x<2>  - 

X22 

x23  • 

• X2M 

_XN1_ 

_XN2 

XN3  • 

• XNM_ 
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°11 

; °12 

°13  ••• 

alM 

v — 

111 

I12" 

°21 

l°22 

°23  • • ‘ 

ct2M 

? — 

_— 21 

Ip  2_ 

_°M1 

1 

1 

1 

| °M2 

°M3  " • 

fn  is  the  variance  of  the  predictand,  while  T21  and  £12  are  the  vectors  each  of 
which  contain  the  covariances  between  the  predictand  and  each  predictor.  is 

the  matrix  of  covariances  between  predictors.  According  to  Anderson's  definition 

2.5.1,  I1?  2'^  is  the  matrix  of  regression  coef ficients . That  is,  in  a multiple 
regression  equation: 


= C + a2  x2  + + 


aM  XM 


the  a's  are  given  by: 


7 v--*- 

-12  -22 


aM  = - 


(6.6) 


(*.7) 


Equation  (6.6)  begins  with  a2  X2  because  of  the  manner  in  which  x has  been  divided. 
That  is,  xp  is  the  first  predictor.  C in  the  equation  is  not  important  in  this 
development  since  it  will  equal  zero  when  x^,  xp,  ...  x^  are  transnormalized  varia- 
bles with  means  equal  to  zero.  Suppose  the  distribution  of  X is  multivariate  normal. 
Then,  the  frequency  density,  f(>C),  is  given  by: 

f(X)  = EXP  [-  i(x  - u)T  7"1  (x  - J/(2tt)H/2  = N (u,  7)  (6.8) 


where  x is  the  vector  of  the  ith  measurement 

u is  the  vector  of  the  mean  value  of  the  ith  measurement 
Superscript  T indicates  transpose 

Equation  (6.8)  describes  a multidimensional  density  with  maximum  density  at  the 
mean.  Surfaces  of  equal  density  are  hyperellipsoids  with  eccentricity  depending  on 
the  covariance  matrix.  The  notation  N(u,  _7)  emphasizes  that  the  distribution  is 
completely  specified  by  the  mean  vector  ji  and  the  covariance  matrix  _7. 

6.2.  Conditional  Probability. 

The  conditional  distribution  of  given  a specified  is,  by  Anderson's 

theorem  2.5.1: 


f(X(1)|x2,  X3  ...  XM)  = Nt^1)  + 712  (x(2)  - J2))  21X  - S12  7^  (6.9) 

Thus,  for  x2,  Xj,  ...  xM  held  constant,  the  distribution  of  x(1)  (the  predictand) 
is  normally  distributed,  N(m,  v),  with  mean  m and  variance  v.  If  the  variables  in 
X have  1^  = 0 and  = 1 (transnormalization  insures  that  they  will),  the  distri- 
bution is  called  the  standard  normal  and  the  equations  are  simplified: 


.,(1)  + y y-1  /v(2)  . M 

11  T _12  -22  H 

!))  - (I12  lip)  I(2) 

(6.10) 

v = 7 - 7 r--'-  y -*•  1 

-11  -12  -^-22  -21 

(Il2  Ipp)  Ipi 

(6.11) 

and  further  simplified  by  substituting  Equation  (6.7). 
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(6.12) 


v 1 ■ (L\2  I22)  -21  1 " A —21  1 (ap^pi  + a5°ii  + ...  + am°ml) 


-12  ^22 1 -21 


-21 


22!  a3  31 


m ml ' 


(6.13) 


r 


1 


or  using  Anderson's  definition  2.5.3  and  the  fact  that  an  = 1 for  transnormalized 
variables,  the  multiple  correlation  coefficient  (R)  is  given  by: 


T~\  T, 


—12  ^22  -121  _ ,L  y-l  v 
-12  -22  -21 


^11 


r2  -12  ^22  ^21  - ^21 


(6.14) 


(6.15) 


Thus,  from  Equations  (6. 13)  and  (6.15),  the  variance  of  the  conditional  distribution 
can  be  written: 


v = 1 - Rc 


(6.16) 


and  the  standard  deviation: 


s = - Rc 


(6.17) 


Using  notation  Pr(Q)  as  the  probability  of  Q,  and  ^(Q)  as  the  normal  cumulative 
probability  less  than  Q: 


Pr(x(1)  <x(1)|x2,  x3,  ...  xm)  = *(x(  V m) 


(6.18) 


or  using  Equations  (6.12),  (6.13) > and  (6.17): 

/ -1  \ / -i  \ /x|  ^ “ a0x0  ~ a^x^  “ ...  a x \ 

Pr(x(1)  <x[1)|x2,  x3  ...  xm)  = • (-1 S-l 2_2 SJS> 


(6.19) 


At  this  point  relabel  X^1^  as  Y to  signify  that  it  is  the  predictand,  and 
relabel  xg  as  p-^  to  show  that  it  is  the  first  predictor,  x3  as  p2,  etc.,  then: 

11  «»  »»  ^ ii 

[V  - aiP;L  - a2P2  " • • • aKH^ 


Pr(Y  < y |p2,  P2,  . . • , Pk)  = * (- 


Jl  - 


(6.20) 


where  y is  the  transnormalized  boundary  of  the  predictand  category 
p^  is  the  ith  transnormalized  predictor 

a3  are  the  regression  coefficients  given  by  Equation  (6.7) 

R is  the  multiple  correlation  coefficient  given  by  Equation  (6.13) 
K is  the  number  of  predictors,  K = m - 1 

6.3.  Regression  Probability  for  One  and  Two  Predictors. 

In  the  case  of  a single  predictor  Equation  (6.19)  becomes,  as  shown  by 
Gringo rten  (1972): 

Pr(Y  < y|x)  = * (*■  — r2~) 

Jl  - r2 


(6.21) 
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where  r is  the  transnormalized  category  boundary  and  is  also  the  correlation  between 
the  transnormalized  predictor  p and  the  transnormalized  predictand  y. 

Gringorten  (1974)  also  derived  the  two-predictor  case  from  Equation  (6.19): 

II  A M A «l 

,y  - A,p.  - A„p„  N 

Pr (Y  < y |x,  , x„)  = <t>  ' ■ ■ 11 ) (6.22) 

A - A^i  - Agr2 


whe  re  2 

A-^  = (r^  - r^2r2)/(-*-  ” ri2^ 

2 

A2  = (r2  ' rl2rl)/(1  _ r12 ^ 

and  r^  is  the  correlation  between  the  first  predictor  and  the  predictand 

r2  is  the  correlation  between  the  second  predictor  and  the  predictand 

r12  the  corre-'-a'ti°n  between  the  predictors 

6.4.  Probability  for  a Categorical  Predictor. 

Equation  (6.20)  solves  for  a predictor  equal  to  a specific  value  (x) . To  cal- 
culate the  probability  for  a categorical  predictor,  each  value  of  x in  the  category 
must  be  used  - each  weighted  by  the  probability  that  it  belongs  to  the  category. 
Defining: 

0(x)  = (IA^tt)  EXP  (-  i x2)  and  g = (y  - ox ) /,/l  - 
then  this  weight  is 


(x)/Pr(x1  < X < x2) 


Thus : 


Pr(Y  < y|xx  < X < x2)  = ^FtxT1  = W(YJ  S <5(e)  dx 

X1 

or  since  d 'f'(x)  = 0(x)  dx: 

^(Xg) 

Pr(Y  < y Ix^^  < X < x2)  = pprvy  \ 4 (g)  d *(x) 

*(x,) 


(6.22a) 


(6.23) 


l 


Notice  that  Equation  (6.23)  is  the  mean  conditional  probability  for  the  various 
values  of  x in  X. 

Equation  (6.23)  can  be  integrated  numerically  using  the  following  scheme: 

N 

Pr(Y  < y|x1  < X < xg)  = ^ (6.24) 

i=l 


gi  = 


y - o'f’-1  [‘t’^)  + (i  - \)  A't(x)] 


, N = (32  [*(x2)  - *(x1)]J 


■n  - r 

Brackets  (())  are  used  to  show  that  N is  rounded  to  nearest  integer,  so  that 
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N :* i ( x ) = *(x2)  - *(x1)  , or  A«t>(x)  ~ 1/32  , i.e.,  3 % 


Equation  (6.24)  is  an  open  ended  method.  Thus,  values  at  end  points  need  not  be 
calculated. 

Error  is  proportional  to  the  second  derivative,  d 4>(g)/dt>(x)  . Error  is  small, 
usually  much  less  than  1 % for  | n | <0.8  or  for  d’(x)  not  close  0 or  1.  Error  as  high 
as  7%  has  been  observed  with  n = 0.92  and  ^(x^)  = 0,  <J)(x2)  = 0.02. 


If  one  predictor  is  categorical  (Xc)  and  there  are  several  other  predictors 
(x-^,  Xg  . ..)  that  are  exact  values.  Equation  (6.23)  is  modified  thereby: 


Pr(Y  < y |xcl  ^ Xc  < xc2’  x2,  x3  • • • ) = Ptxjx.,  x. 


*(x2c) 


*(&.)  d<K(X„)  (6.25) 


where  g„  = 


y ' AcXc  " A2x2  " A3x3  •** 


Ac,  Ai,  etc.,  are  regression  coefficients  given  by  Equation  (6.7)  and  R is  the 
multiple  correlation  coefficient  given  by  Equation  (6.15). 

Equation  (6.25)  can  be  integrated  numerically  using  (6.24)  but  redefining  : 


y - Ac  P-J-  P(xlc)  + (i  - i)  AP(xc)  - AgXg  - 


(6.26) 


.11  - R 


If  two  predictors  are  categorical,  a double  integral  (numerically,  a iouble 
summation)  is  required  to  calculate  the  conditional  probability.  However,  if  the 
simple  correlation  between  the  categorical  predictor  and  the  predictand  is  not  too 
large,  i.e.,  less  than  0.6,  the  equivalent  normal  deviate  of  the  median  of  the 
category  can  be  used  as  an  approximation. 

6.5.  Multiple  Predictand  Categories. 

One  of  the  strengths  of  the  TRP  model  is  the  use  of  calculating  conditional 
probabilities  for  multiple  categories.  The  mean  predictor 


(m,  m = a1x1  + a^2  + ...  aDx  ) 


and  the  standard  deviation 


(s,  s = ,/l  - R ) 


need  only  be  calculated  once  for  all  predictand  categories.  The  conditional  prob- 
ability of  the  kth  predictand  category  is  given  by: 


Pr(yw.i  < Y < x 


k1  i»  a2  •••  p> 


= « ^ _ * ^K-l  - 


(6.27) 


where  yK  is  the  upper  transnormalized  bound  of  the  category  and  is  the  lower 

transnormalized  bound.  For  K = 1,  Equation  (6.27)  becomes: 


Pr (Y  < yk.)  = 


(*Y.  - m\ 


(6.28) 


For  the  uppermost  category.  Equation  (6.27)  becomes: 
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/Y I s ■]  “ 

Pr(yK-l  *Y)  = 1 ' * ) (6.29 

6.6.  Changing  Predictand  Categories. 

If  the  predictand  categories  are  changed,  it  is  not  necessary  to  redevelop  the 
regression  coefficients;  only  the  new  transnormalized  predictand  boundaries  are 
required.  This  capability  gives  the  TRP  model  great  flexibility  since  the  method 
can  be  developed  for  a specified  category,  e.g.,  rainfall  greater  than  1/2  inch, 
and  then  operationally  used  for  another  category,  e.g.,  rainfall  greater  than  1 inch. 
Furthermore,  this  capability  allows  probabilities  to  be  calculated  for  any  category 
upon  request  in  a real-time  query  response  mode.  Another  result  of  this  capability 
is  that  probabilities  can  be  forecast  in  an  objective  manner  for  categories  that 
have  never  occurred  — yet. 


f 
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Chapter  7 

EXAMPLES  AND  APPLICATIONS 


Two  detailed  examples  that  illustrate  the  TRP  model  are  contained  in  this 
chapter.  These  examples  show  the  diverse  ways  the  TRP  model  can  be  developed  and 
are  not  intended  for  operational  use.  Operational  use  of  the  TRP  model  has  been 
described  by  Boehm  (1973)  and  Young  (1975). 

7-1.  Overall  Development  and  Verification  Procedures. 

In  both  of  these  examples,  data  were  divided  into  a dependent  data  set  and  an 
Independent  data  set.  Considerable  care  v/as  taker,  to  keep  the  independent  data 
from  influencing  in  any  way  the  development  of  the  TRP  coefficients.  Multiple 
discriminant  analysis  (MDA)  coefficients  were  also  calculated  with  the  dependent 
data  using  the  BMD05M  computer  program  (Dixon,  1973)- 

Both  TRP  and  MDA  procedures  were  verified  using  a variety  of  skill  scores  and 
diagnostic  parameters,  but  most  reliance  was  placed  on  the  Brier  score  (Brier,  1950) 
and  per  cent  correct  forecasts  using  the  most  probable  forecast.  The  Brier  score 
(PS)  is  calculated: 

N K 

- y.  1 I <pu  - Vs  d-b 

i=l  j=l 


where  N is  the  number  of  categories 
K is  the  number  of  forecasts 

. is  the  probability  for  category  i for  the  jth  forecas 
j = 1 if  category  i was  observed  for  the  jth  forecast, 

A perfect  score  is  zero;  the  worst  posible  score  is  2. 


otherwise 


i 


0 


7.2.  Probability  of  Frozen  Precipitation. 

Walker  (1974)  collected  1000-500  mb  thickness  and  the  corresponding  type  of 
precipitation  for  13  European  locations  for  November  and  December  1973-  For  this 
example,  the  data  were  randomly  divided  into  two  sets:  one  set  of  six  locations  for 
the  "dependent  data  set  (195  cases)  and  one  set  of  seven  locations  for  the  independ- 
ent data  set  (272  cases).  Precipitation  was  divided  into  two  categories:  cate- 
gory 1 for  rain;  and  category  2 for  frozen  precipitation  consisting  of  snow,  freez- 
ing rain,  mixed  snow  and  rain,  and  ice  pellets.  Days  with  no  precipitation  were 
excluded  from  the  original  study  so  that  the  forecast  is  the  probability  of  frozen 
precipitation  given  that  precipitation  occurs.  The  forecast  metho  i was  developed 
three  ways  (two  direct  predictors,  modeled  marginal  predictor,  and  climatology  by 
station),  to  illustrate  the  concept  of  a marginal  predictor.  Details  of  the  three 
ways  are  given  in  the  following,  sections. 


7.2.1.  ’’rozen  Precipitation  with  Two  Direct  Preoictors . 


Thickness  and  station  elevation  were  transncrmalized  separately  and  used  as 
direct  predictors.  The  forecast  equation  -was  of  the  form: 


M 

c 


JI  M <1 

y - h1  E - A2  T 


(7.2) 


wh^re  c is  the  equivalent  normal  deviate  that  specifies  the  conditional  probability 
of  rain,  $"(c),  or  frozen  precipitation,  '(-3^.  Note  that  l-(c)  + (-c)  = 1 because 

of  the  symmetry  of  the  normal  distribution,  y = 0.4677;  y is  the  climatological 
probability  (y  = 0.68)  of  rain  versus  frozen  precipitation  for  the  locations  used  in 
the  development  sample: 
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Ax  = 0.066,  A2  = -0.397,  K = 0.90 

r.  is  the  END  of  elevation  given  by  the  cumulative  distribution  of  station  eleva- 
tions in  the  ievelopraent  sample.  T is  the  END  of  thickness.  E on:  T wen  i<  • rmiiw 
by  the  hirt.'.-x-afr.  !r.*crp-  lotion  noth  > > (met  ion  A. 3.1)  which  prove:  to  give  somewhat 
unsatisfactory  results  since  the  elevation  of  several  of  the  stations  in  the  inde- 
pendent sample  were  higher  than  any  of  the  stations  of  the  iependent  sample. 

'•  :■  ampler : An  elevation  of  154ra  and  a thickness  of  5340ra  gave  a 10#  probability  of 
frozen  precipitation  while  a 43m  elevation  ani  5210m  thickness  gave  a 
77#  probability  of  frozen  precipitation. 

The  dependent  data  verified  86.2#  correct  ani  with  a Brier  score  of  0.23.  The 
independent  data  verified  77.2#  correct  and  with  a Brier  score  of  0.40.  These 
verification  results  can  be  compared  with  other  results  by  referring  to  Table  2. 


Table  2.  Frozen 

Precipitation 

Verificat 

ion  Results. 

Deoendent 

Data 

Independent 

Data 

Method 

Per  Cent 
Correct 

Brier 

Score 

Per  Cent 
Correct 

Brie  r 
Score 

Discriminant  Analysis 

86.7 

0.19 

86.0 

0.20 

Two  Direct  Predictors 

86.2 

on 

OJ 

c 

77-2 

0.40 

One  Direct  and  One 
Marginal  Predictor 

88.7 

0 

ro 

0 

82.0 

O.33 

Unconditional 

Probability  Known 

89.9 

0.18 

87.I 

0.19 

'rozen  Precipitation  with  One  Direct  and  One  Marginal  Predictor. 

Elevation  should  not  be  used  as  a direct  predictor  because  the  elevation  values 
have  an  artificial  distribution,  i.e.,  the  values  were  not  determined  by  nature; 
man  selecte  ! the  station  locations  (and  consequently  the  elevations)  based  on 
various  economic  and  social  factors.  This  point  is  critical  since  it  governs  which 
preiict.rc  should  be  direct  predictors  (distribution  determined  by  nature)  and 
which  prciictors  must  be  marginal  (distribution  is  artificial  since  values  were 
“termir.cj  by  man). 

Using  elevation  as  a marginal  predictor,  the  forecast  equation  was  of  the  form: 


U _ 


(7.3) 


where  y is  now  a function  of  elevation  (E), 

y = - 0.000437E  + 0.5645 
A1  = R = 0.93 

't  = END  of  thickness  (transnormalized  using  direct  method.  Equation  (4.21)) 
T = - 20498.  + 116. 4t  - 0.2205T2  + 1.393T3 

It  was  determined  that  the  distribution  of  thickness  was  not  significantly  af- 
fected by  station  elevation  in  the  dependent  sample  although  this  would  not  be  the 
ease  if  the  stations  were  farther  apart.  The  third-order  polynomial  gave  the  best 
fit  since  the  thickness  distribution  was  found  to  be  skewed.  The  higher  accuracy 


3^ 
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of  the  direct  method  over  the  histogram  method  used  in  the  previous  section  accounts 
for  the  increase  in  correlation  from  0.90  to  0.93. 

E xamgles:  An  elevation  of  154m  and  a thickness  of  5?40m  gave  an  8%  probability  of 
frozen  precipitation  while  a 43m  ana  5210m  thickness  gave  an  8656  prob- 
ability of  frozen  precipitation. 

The  dependent  data  verified  88.7/5  correct  with  a Brier  score  of  0.20.  The  inde- 
pendent data  verified  82#  correct  with  a Brier  score  of  0.33  (see  Table  2). 

7.2.3.  Frozen  Precipitation  with  Unconditional  Probability  Known  for  Each  station. 

II 

Examination  of  y as  a function  of  elevation  showed  considerable  ieviation  at 
individual  stations.  This  suggests  that  various  local  effects  play  an  important 
role  in  the  probability  of  frozen  precipitation.  Thus,  each  station's  unconditional 
(climatological)  probability  of  frozen  precipitation  was  used  to  determine  a if  for 
that  station  in  Equation  (7.3)  with  the  results  that  the  dependent  data  verified 
39-9/5  correct  with  a Brier  score  of  O.lS,  and  the  independent  data  verified  87. 1# 
correct  with  a Brier  score  of  0.19.  Since  the  independent  data  were  used  to  get 
the  unconditional  probability  of  frozen  precipitation,  the  independent  results  are 
not  entirely  independent. 

7.2.4.  Probability  of  Frozen  Precipitation  with  Discriminant  Analysis. 

The  BMD05M  (Dixon,  1973)  computer  program  was  used  to  obtain  discriminant  analy- 
sis coefficients  for  the  probability  of  frozen  precipitation  using  thickness  and 
elevation  as  pi’edictors.  The  forecast  equation  is: 

EXP  (AX1  + A12  E + A13  T) 

P = EXP  (A1J_  + A12  E + A13  T)  + EXP  (A^  + A22  E + A?-<  7) 

where  P is  the  probability  of  frozen  precipitation 
E is  elevation 
T is  thickness 

and 


‘11  = - 3032- 

A21  = - 3221. 

‘12  = - °-032 

Agg  = - 0.087 

■13  = X1-64 

A?-s  = 12.00 

Examples : An  elevation  of  154m  and  a thickness  of  5340m  gave  a 19#  probability  of 
frozen  precipitation  ana  an  elevation  of  43m  and  a 5210m  thickness  gave 
a 94#  probability  of  frozen  precipitation. 

Note  that  Equation  (7.4)  cannot  be  used  directly  because  of  numerical  overflow. 
Instead,  a value  is  subtracted  from  each  exponential  argument  which  is  equivalent 
to  dividing  the  numerator  and  denominator  by  a constant. 

The  dependent  data  verified  86.7#  correct  with  a Brier  score  of  0.19.  The 
independent  data  verified  86.0#  correct  with  a Brier  score  of  0.20. 

7.2.5.  Frozen  Precipitation  Conclusions. 

The  important  result  of  this  example  is  that  it  shows  the  importance  of  local 
effects  in  a statistical  forecast  procedure  an  i the  ease  with  vfhich  these  effects 
are  taken  into  account  by  the  TRP  model.  The  small  differences  between  TRP  and 
discriminant  analysis  may  be  due  to  the  high  climatological  probability  (39#)  of 
frozen  precipitation. 


(7.4) 


* 

M 


3*5 


r 
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7.3.  Probability  af  Cell lug  Cato; , )ries  by  M : :<.•!  Output  Lai,  i t i . 

Data  for  this  example  were  supplied  by  the  Technique  Development  Laboratory  of 
the  National  Weather  Service.  The  predietani  was  the  1800Z  ceiling  as  specified  by 
the  five  categories  with  lower  limits  at  0,  200,  500,  1000,  an : 2000  feet  for  Cleve- 
land, Ohio,  for  December  and  January.  The  predictors  came  from  the  0000Z  run  of 
the  NMC  primitive  equation  (PE)  and  trajectory  models,  and  the  <y  00Z  ceiling  obser- 
vation. Predictors  derived  from  a numerical  weather  predictor  model  are  terme : 

Model  Output  Statistic  (MOS)  predictors.  The  MOS  predictors  were: 


Predictor 

Model 

Valia  Time 

Predictor 

1 

PE 

lSOOZ 

Precipitable  Water 

2 

PE 

1800Z 

Boundary  Layer  Divergence 

3 

PE 

1800Z 

Relative  Humidity  400-1000  mb 

4 

PE 

1800Z 

Precipitable  Water  (Smoothed  Over  9 Points) 

5 

Trajectory 

2400Z 

Surface  Relative  Humidity 

6 

Trajectory 

2400Z 

850-mb  Relative  Humidity 

7 

PE 

2400Z 

Relative  Humidity  400-1000  mb 

The  cosine  of  the  day  of  the  year  was  also  included  (predictor  8)  along  with  the 
0600Z  ceiling  observation  (predictor  9).  These  predictors  were  selected  at  the 
Technique  Development  Laboratory  by  stepwise  regression  using,  the  winter  (October 
through  ."arch)  seasons:  I969-I970,  1970-1971,  1971-1972,  and  1972-1973.  December 
1970  and  January  1971  (54  cases)  were  used  for  independent  data  and  was  withheld 
from  the  development  data  base.  The  Regression  Estimate  of  Event  Probability  (KEEP) 
development  set  consisted  of  all  the  development  data  base.  Only  the  December  an: 
January  development  data  (162  cases)  were  used  for  MDA  and  TRF. 

7.3.1.  TRF  Ceiling  Forecast  by  '-IQS. 

Correlation  between  all  pairs  of  the  nine  predictors  'was  calculated  by  the 
product  moment  formula  Equation  (5.2)  after  each  variable  had  been  transnormalized 
by  histogram  interpolation.  Equation  (4.5).  The  predictor  correlation  matrix  is 
given  in  Table  3- 


Table  3.  TRP  Predictor  Correlation  Matrix. 


Pre dictor 

1 

2 

3 

4 

5 

6 

7 

8 

9 

1 

1.000 

-0.531 

0.632 

0.953 

-C.035 

0.425 

0.623 

-0.263 

-0.035 

2 

-0.581 

1.00 

-0.690 

-0.475 

-0.044 

-0.667 

-0.649 

-0.043 

-0.053 

3 

0.632 

-O.69O 

1.000 

0.527 

0.169 

0.749 

0.902 

0.157 

-0.033 

4 

0.953 

-0.475 

0.527 

1.000 

-0.059 

0.354 

0.519 

-0.325 

-0.071 

5 

-0.035 

-0.044 

0 . 169 

-0.059 

1.000 

0.326 

0.015 

0.198 

-O.086 

6 

0.425 

-0.667 

0.749 

0.354 

0 . 326 

1.000 

0.614 

0.159 

0.C20 

7 

0.623 

-0.649 

0.902 

0.51° 

0.015 

0.614 

1.00C 

0.193 

-0.021 

8 

-0.263 

-0.043 

0.157 

-0.325 

0.198 

0.159 

0.193 

1.000 

-0.014 

9 

-0.035 

-0.053 

-0.033 

-0.071 

-0.088 

0.020 

-0.021 

-0.014 

1.000 

The  product  moment  formula,  Equation  (5.2),  also  was  used  to  arrive  at  the 
correlation  between  each  predictor  and  the  predietani.  However,  correlations  ap- 
peared to  be  unusually  low.  In  particular,  the  correlation  between  the  0600"  an  i 
1800Z  ceiling  observations  was  -0.13.  McCabe  (1968)  found  the  autocorrelation  of 
transnormalized  ceiling  to  be  0.A5  at  12  hours.  The  value  of  -0.13  was  found  to  be 
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due  to  large  numbers  of  clear  and  high  ceilings  at  Cleveland  in  December  and  January 
and  the  observing  bias  which  often  reports  high  ceilings,  e.g.,  25,000  feet,  as 
clear  at  0600Z  (midnight  local  time).  The  tetrachoric  correlation  formula.  Equa- 
tion (5*22),  with  the  dichotomy  division  at  3000  feet  overcame  this  problem  and  gave 
a correlation  of  0.479  which  is  quite  close  to  McCabe's  result  of  0.^5. 

Thus,  the  correlations  were  recalculated  using  the  tetrachoric  formula  when  both 
of  the  variables  were  ceilings  and  the  biserial  formula.  Equation  (5*7),  whenever 
one  of  the  variables  was  a ceiling.  The  predictor  intercorrelation  matrix  was 
unchanged  except  for  the  correlations  using  predictor  9 — the  0600Z  ceiling  obser- 
vation. The  predictor  9 intercorrelations  became  (1)  -0.007,  (2)  -0.003, 

(3)  -0.048,  (4)  -0.003,  (5)  0.157,  (6)  -0.077,  (7)  -0.017,  (8)  -0.038,  and  (9) 

1.000.  The  correlation  between  each  predictor  and  the  predictand  was  calculated 
to  be  (1)  0.084,  (2)  -0.284,  (3)  0.122,  (4)  0.076,  (5)  -0.028,  (6)  0.082,  (7)  O.O63, 
(8)  0.101,  and  (9)  0.479. 

Equations  (6.27),  (6.28),  and  (6.29)  were  used  for  the  forecast  equation: 

II 

„ Yi  ' M , , 

r*  s-  -*■  / r7  c \ 


a - r 


where  M is  the  "mean  predictor": 

M = A^  + Agirg  + . . . + AgPg 


A1 

= • 

-0.257 

a4 

= 0.264 

A7 

-0.624 

a2 

= . 

-0.433 

A5 

= -0.201 

II 

- CO 

< 

0.218 

a3 

= 

0.539 

a6 

- -0.139 

A9 

0.522 

R 

= 

0.644 

If 

y^  is  the 

END 

Of 

the  ith 

ceiling  category 

ya  = -2.6 

(O.i 

&); 

II 

, y2  = - 

1.97 

(2.5$),  y3  = 

-1.32 

(9-3$),  y^  = -0.97  (16.7$) 

Example:  Predictor  values,  Px  = 22.9,  Pg  = -26.7,  P,  = 99.0,  = 20.8,  Pg  = 82.2, 

P6  = 86.9,  P?  = 88.5,  Pg  = 0.988,  Pg  = 300,  gave  probabilities  of: 

(1)  0.0007,  (2)  0.05,  (3)  0.16,  (4)  0.15,  and  (5)  O.63. 

Verification  results  were  83.3$  correct  and  a Brier  score  of  0.27  on  the  dependent 
data  and  92.6$  correct  and  a Brier  score  of  0.14  on  the  independent  data.  Results 
can  be  compared  in  Table  4. 

Table  4.  Ceiling  Forecasts  Verification  Results. 

Dependent  Data.  Independent  Data 

Method  Per  Cent  Correct  Brier  Score  Per  Cent  Correct  Brier  Score 

TRP  83.3  0.27  92.6  0.14 

REEP  NOT  AVAILABLE  75-9  0.35 

MDA  67.3  0.50  70.4  0.47 
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Some  of  the  original  predictors  were  dichotomized  to  give  binary  (eouals  0 or  1) 
variables.  Some  of  the  variables  were  used  twice  with  different  dichotomies  givinr 
a total  of  12  new  predictors: 

PREDICTOR 

1.  1000-mb  PE  relative  humidity  (1800Z) 

2.  Ceiling  observation  (0600Z) 

3.  Trajectory  relative  humidity  (2400Z) 

A.  PE  precipitable  water  (l800Z) 

5.  Ceiling  observation  (0600Z) 

6.  PE  relative  humidity  (2400Z) 

7.  Relative  vorticity  (1800Z) 

8.  PE  relative  humidity  (2400Z) 

9.  850-mb  trajectory  relative  humidity  (2400Z) 

10.  PE  precipitable  water  (1800Z) 

11.  Station  elevation 

12.  Cosine  of  day  of  the  year 

7.3.2.  P.EEP  Ceiling,  Forecast  by  .'-'Of. 

Regression  Estimate  of  Event  Probabilities  (REEP)  is  one  of  the  main  methods 
used  by  the  National  Weather  Service  to  make  automated  forecasts  (Glahn,  1975). 

The  REEP  coefficients  were  calculated  and  the  method  verified  by  the  Techniques 
Development  Laboratory.  The  procedure  is  included  here  so  that  the  comparison  can 
be  made  with  TRP  results.  The  REEP  forecast  equation  is  of  the  form: 

Pr(i)  = + A^p-^  + A^pPg  + ...  + A^pP^2  {(A.) 

where  Pr(i)  is  the  probability  of  category  i and  Aj_j  x 100  is  given  in  Table  5. 
Verification  of  the  REEP  method  for  the  independent  data  was  75 .9$  correct  with  a 
Brier  score  of  0.35.  Verification  for  the  dependent  data  was  not  available. 


TYPE 

Continuous 
= 1 below  2000 

Continuous 
= 1 below  9 

= 1 below  200 

= 1 above  95 

= 1 above  -6 

= 1 above  90 

= 1 above  85 

= 1 above  6 

= 1 above  1000 

Continuous 


Table  5.  REEP  Coefficient  Matrix. 


idictor 

i = 1 

2 

3 

4 

5 

i = 0 

5.2 

18.20 

5.07 

-33.15 

1.03 

1 

-0.0004 

-0.07 

0.07 

0.18 

-0.24 

2 

0.42 

3.98 

5.28 

12 . 51 

-22.09 

3 

0.01 

0.07 

0.18 

0.30 

-0.55 

4 

-0.75 

•3-95 

-5.40 

1.22 

8.80 

5 

8.9 

9.24 

3.69 

-12.87 

-8.48 

6 

-1.6 

-10.40 

4 . 16 

2.3 

5.^5 

7 

0.31 

1.58 

3.64 

4.08 

-9.63 

8 

0.27 

-1.24 

-11.01 

1.55 

10.43 

9 

-3.3^ 

-2.34 

-3.14 

5.33 

3.^9 

10 

0.25 

-0.30 

-2 . 36 

-7.18 

9.58 

11 

-0.82 

-5.52 

-3.71 

3.9^ 

6.12 

12 

0.14 

2.75 

3.39 

4.45 

-10.73 
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7.3*5.  MPA  Ceiling  forecast  by  MOT. 

Multiple  Discriminant  Analysis  (MDA)  was  also  used  to  compare  the  verification 
results  of  the  TRP  model.  However,  the  lower  two  categories  were  grouped  together 
since  the  lowest  category  did  not  occur  in  the  development  data  base  and  MDA  re- 
quires some  development  data  in  each  category.  MDA  coefficients  were  calculated 
using  the  BMD05M  (Dixon,  1973)  computer  program.  Predictors  are  the  same  as  in 
section  7.3.1.  The  following  IDA  forecast  equation  was  used: 


where 

and 


ij 


Pr(i) 

EXP  (A.0  + 

AilPl  + Ai?P2  + 

...  + A 

iqPg) 

9 

I EXP  <AjO 
0=1 

+ AJ1P1  + Aj2P2 

+ . . . + 

Aj9p9) 

i)  is  the 

probability  of  the  ith  category 

is  the  value  of  the  jth 

predictor 

. is  as  gi 

ven  in  Table  6 . 

Table  6. 

MDA  Coefficient 

Matrix . 

Predictor 

i = 1 

2 

3 

4_ 

c_,. 

II 

O 

-263.0 

-255.0 

249.0 

-262 . 0 

i 

0.410 

0.609 

-0.030 

0.617 

2 

0.512 

0.452 

0 . 426 

0.442 

3 

0.022 

-0.028 

0.025 

0.022 

4 

-0.115 

-0.358 

0.332 

-0.465 

5 

0.034 

0.090 

0.079 

0.074 

6 

0 . 0003 

0.021 

-0.002 

0.00001 

7 

0 . 072 

0.114 

0.035 

0.098 

8 

527.0 

517.0 

509.0 

522.0 

9 

0.014 

0.011 

0.013 

0.016 

(7-7) 


The  coefficient  of  predictor  8 (cosine  of  the  day  of  the  year)  is  much  larger 
than  the  other  coefficients  beca.use  of  the  small  variation  in  the  cosine  for  Decem- 
ber and  January. 

Because  the  lowest  category  did  not  occur  and  was  given  very  small  probability 
forecasts  by  REEP  and  TRP,  it  is  possible  to  compare  the  four  category  MDA  per  cent 
correct  with  the  other  five  category  results.  The  Brier  score,  -wever,  needed  to 
be  modified  for  five  categories:  If  the  lowest  category  does  no.  ^ccur  and  would 
have  near  zero  probability,  then  the  equivalent  five  category  Brier  score  is  equal 
to  the  four  category  score  (set  r = d = 0 for  the  lowest  category  in  Eauation  (7.1) 
for  proof).  A rur.  of  the  TRP  model  with  four  categories  verified  this  result. 

Thus , the  MDA  verification  results  were  67.3#  correct  with  a Brier  score  of  0.50 
for  the  iependent  data  anti  70.4#  correct  and  a Brier  score  of  0.47  for  the  inde- 
pendent data  (see  Table  4). 

7.3.4.  Conclusions  for  MOP  Celling  Forecasts. 


The  independent  data  shows  improvement  of  TRP  over  REEP  and  MDA;  however,  the 
sample  size  was  relatively  small  and  further  tests  using  larger  samples  are  needed 
before  forming  definitive  conclusions.  Part  of  the  improvement  may  be  due  to  the 


r 
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rarity  of  the  lower  categories.  The  number  of  categories  is  not  important  in  the 
TRP  model  since  predictand  categories  are  not  used  in  the  development  of  the  coef- 
ficients. The  effect  of  many  categories  is  exemplified  by  the  number  of  parameters 
used  in  TRP  forecast  Equation  (7.5),  in  the  REEP  forecast  Equation  (7.6),  and  the 
MDA  forecast  Equation  (7.7). 
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Chapter  8 


CONCLUSIONS  AND  RECOMMENDATIONS  FOR  FUTURE  RESEARCH 


Transnormalized  regression  probability  (TRP)  shows  considerable  potential  as  an 
objective  probability  forecast  method.  It  combines  climatology  transnormalization 
with  the  flexibility  of  various  correlation  formulas  and  the  multivariate  normal 
theorems  on  regression  probability.  TRP  is  particularly  useful  with  highly  skewed 
variables  such  as  ceiling  or  wind  speed,  either  as  predictors  or  as  the  predictand. 
It  can  also  be  used  with  a large  number  of  changeable  predictand  categories.  In 
contrast,  REEP  and  MDA  require  many  coefficients  for  multiple  categories.  In  the 
applications  that  have  been  tried  to  date,  TRP  verified  better  on  indenendent  data 
than  did  REEP  or  MDA. 

Transnormalization  allows  for  a systematic  approach  to  diurnal  and  seasonal 
effects.  Because  local  effects  are  to  a large  extent  also  taken  into  account,  TRP 
can  be  used  for  several  locations  using  the  same  coefficients. 

TRP  makes  a distinction  between  the  predictors  that  are  determined  by  nature 
(direct  predictors)  and  the  ones  that  are  artificially  distributed  (marginal  pre- 
dictors). This  distinction  makes  TRP  particularly  well  suited  for  a passive  ob- 
servation science  such  as  meteorology  versus  a designed  experiment  science  such  as 
chemistry . 

A limited  number  of  operational  applications  of  the  TRP  model  have  been  made. 
Much  more  research  is  needed  to  determine  what  conditions  produce  the  best  results. 
The  transnormalized  correlation  matrix  lends  itself  to  stepwise  regression,  although 
this  technique  has  not  yet  been  tried.  Furthermore,  since  the  transnormalized 
variables  all  have  means  of  zero  and  unit  variance,  the  magnitude  of  a TRP  coeffi- 
cient is  related  to  the  importance  of  a given  predictor. 

Hermite  polynomials  of  the  second  kind  (Abramowitz  and  Stegun,  1964)  appear  to 
offer  a method  of  increasing  the  number  of  predictors  to  handle  types  of  nonlinear- 
ity that  straightforward  trar.snormalization  cannot  handle.  In  addition,  these 
othogonal  polynomials  permit  interaction  between  predictors  because  they  incorpo- 
rate a weighting  function  that  is  equal  to  the  normal  density  function. 

Further  work  is  needed  to  determine  which  transnormalization  methods  are  best 
suited  for  specific  meteorological  variables. 


, 
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PROBABILITY  FOR  AN  UNOBSERVED  EVENT 


Often  it  is  desirable  to  have  conditional  probability  estimates  for  predictor 
categories  that  have  never  been  observed.  If,  in  these  rare  cases,  the  probability 
density  of  a particular  frequency  being  the  true  frequency  is  distributed  according 


to  the  Poisson  distribution  for  zero  cases,  then: 


iPr  C f 1 0 ) 
du 


-u 


u — inf 


(A.l) 


m is  the  total  number  of  observations,  f is  the  frequency  of  occurrence  for  an 
infinitely  long  period  of  record,  and  P(f  |0)  is  the  probability  of  a particular  f 
being  the  true  frequency  given  that  zero  cases  have  occurred. 

Equation  (6.22)  or  Equation  (6.23)  could  now  be  integrated  over  all  possible 
values  each  weighted  by  the  probability  of  its  being  the  true  value: 


X. 


Pr(Y|X)  = j Pr(Xg!0)  Lp7py  \ 0(X)  *(g)  dx 

X 


ax 


x-, 


*■1 


Instead,  a much  simpler  scheme  can  be  used, 
the  median  value  of  u,  designated  as  u: 


0.3  = \ e' 

6 


du. 


(A. 2) 

Equation  (A.l)  is  integrated  to  find 

u = 0.7  (A. 3) 


Since  mf  = u,  the  median  frequency  f = 0.7/m.  A frequency  higher  than  the  median 
frequency  tends  to  favor  forecasting  higher  categories,  a lower  frequency  favors 
lower  categories.  Thus,  using  the  median  frequency  is  "optimal"  in  the  sense  that 
as  more  data  becomes  available,  half  of  the  probability  forecasts  will  favor  lower 
categories  and  half  will  favor  higher  categories.  In  using  the  median  frequency. 
Equation  (6.20)  is  used  directly  without  any  integration. 
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TAYLOR  SERIES  ESTIMATES  FOR  JOHNSON'S  LOGNORMAL  AND  HYPERBOLIC  SINE  DISTRIBUTIONS 

The  Lognormal  Distribution 

The  lognormal  distribution  is  ol‘  the  Equation  (9.25)  form: 

x = a + b ?-(x  + K)  (B.l) 

If  K is  known,  a and  b can  be  easily  estimated  by  the  method  of  transnornalizeu 
quantile  fitting  (Chapter  4)  by  using  a dummy  variable,  Q = Pr(x  + K),  and  x = a + b£. 
An  estimate  of  K can  be  obtained  by  expanding  Equation  (3.1)  in  a Taylor  series  ami 
solving  for  K in  terms  of  the  coefficients  of  a cubic  transnorinalized  quantile  fit: 

x = A + Bx  + Cx2  + Dx:;  (3.2) 

The  Taylor  series  of  Equation  (B.l)  is: 

!:  = (a  + b K)  + (b/K)  x + (-  i b/K2)  x2  + (i  o/K3)  x3  ( . :) 

• D 

•ror.  Equations  (3.2)  and  (3.3): 


C 

D 


Johnson's  (1949)  hyperbolic  sine  distribution  is  of  the  Equation  (4.26)  form: 

x = a + b Sir.h-^  (gx  + h)  ( •<-  ) 

piven  g and  h,  find  a and  b by  using  a dummy  variable,  w = Sinh(gx  + h),  aid 
x = a + bQ.  The  third-order  Taylor  series  of  31  = e + b Sinh(Y)  is: 

x = a + bY  - bY3/6  (•■7) 


an!  setting  Y = gx  + h: 
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B _ gb  - gh^b/2  -2  h 

C - bg?h/2  Kh  e 

Solving  in  Equation  (B.10)  and  substituting  in  Equation  (B.ll): 

^2  _C_ 

C gh  3D 

Solving  gh: 


(B. 12) 


?D  - B/C 


(B.13) 


But  from  Equation  (B.10),  g = 3hD/C,  so  that: 


h(3hD/C) 


or  h = 


C - 3DB 


(B.l9) 


C - 3DB 


(B.15) 


If  the  quantity  under  the  radical  is  negative,  h and  g will  have  to  change  sign  tc 
satisfy  Equation  (B.13).  Thus,  if: 


C - 3L)3 


./  C - 3DB 


(3.16) 


The  above  estimates  for  , , h,  and  K give  approximations  that  are  close  enough  for 
most  forecast  problems  in  which  the  error  due  to  other  factors  is  greater  than  the 
rr:  r introduce  i by  the  approximations.  However,  with  highly  reliabl<  iata,  the 
iterative  methods  described  by  Johnson  (1999)  an  i Ord  (1972)  nay  be  worth  the  extra 
effort. 
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CALCULATION  OR  MOMENTS  AND  CUMULANT.C 


If  the  algebraic  form  of  the  freouency  iistribution  is  unknown,  there  are  gen- 
eral meth  Is  for  fitting  a Iistribution.  These  neth  ><ie,  for  the  most  pa  rt , depend 
first  on  calculating  the  moments  of  the  variable.  Four  moments  are  usually  used 
as  large  sampling  errors  are  encountered  when  using  a greater  number  of  moments. 
Using  fewer  moments  also  gives  a desirable  smoothing  to  the  iistribution.  Moments 
are  calculated  as  follows: 

Let  T = total  number  of  observations 

V(I)  = value  of  the  Ith  observation 

Moments  about  the  origin  are  calculated  (summations  arc  from  I = 1 to  T,  A,  is  the 
first  moment,  Ag  the  seconi,  etc.): 


A2  = (l/T)  I V(I) 
a2  = (i/T)  r v(i)2 
a3  = ( i/t ) y.  v(i y 

A4  = (l/T)  T V(I)4 


(C.l) 


Raw  moments  (R)  about  the  mean  are: 


Rx  = 0 


Rg  = Ag  - 


R,  = A.,  - 3A^Ag  + 2A^ 


2 A 

R^  = Ai(  - 4A^A3  - bAj  Ag  - 3AX 


(C.2) 


According  to  Cramer  (19^f , p.  352),  consistent  and  unbiased  estimates  of  the  central 
moments  are: 


Mg  = RgT/(T  - i) 

M3  = R3T2/[(T  - 1) (T  - 2)) 

[R,,(T2  - 2T  - 2)  - R2(6T  - 9)]  T 

M h ~ ( ^ * 3 ) 

(T-l^T-2'fT-3) 
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Cumulants  (K)  are  also  use  i in  describin''  iistributi' ns  (Cornish  and  Fisher,  1937). 
Cunulants  are  also  calle  i semi-invariantr  r Fisher’s  K statistics.  Unbiased 
estimates  are  calculate.: 


K1  = A1  , Kg  = Kg,  K3  = ttj 

, «,<*-'-■  - (c.„ 

(T  - 1 ) (T  - 2) (T  - 3) 


Kendall  and  Stuart  (1358)  recommenl  a correction  to  the  cumulants  h?  an:  K^,  similar 
to  Sheppard’s  corrections  for  moments,  when  the  cumulants  have  been^calculate o from 
iata  that  has  seen  lumped  into  classes  of  size  h.  The  corrected  cumulants  (K'j 
are : 


Kjf  - K?  - hr/12 

K/  = \ - h^/120  (C.5) 


The  corrected  versions  should  be  used  when  data  have  been  croupe;  into  equal 
classes . 
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ERRATA  SHUT 

AWS-TR-7 5-259,  "Tranaonormalized  Regression  Probability,”  by  Capt  Albert  R. 
Boehm,  December  1976,  52p  (AD-A047720)  is  changed  as  follows: 

1.  Page  11,  caption  for  Figure  6,  Line  9 should  read: 
occurred  less  than  y^  - M(Xj)  because  x 

2.  Page  14,  Equation  4.8,  third  term  In  parenthesis  should  be  -SK3/2  Instead 

of  -SK3/6. 


3. 

4. 

5. 

6. 
7. 


Page  14,  Equation  4.9,  Aj  Instead  of  H|. 

Page  14  the  line  below  Equation  4.9 
A j and  M2  are  the  mean  and  variance  (Appendix  C), 

Page  14,  Equation  4.11  first  term  in  parenthesis  K3/6  Instead  of  SK3/6. 
Page  15,  Equation  4.13,  - f (x-a)/(b-cx-dx2) 

Page  15,  Equation  4.14  the  numerator  should  be:  B 1 2'*'3) 2 


8.  Page  15,  Equation  4.16, 

P ■ t-(a  +bt )/( I+ct+dt ). . .and  add:  for  0 < P < .5) 

9.  Page  16,  second  line  after  Equation  4.18  should  read  A1  - ...  +K4M2"2/8 

instead  of  A1  » ...-K4M2  . 

10.  Page  16,  Line  5,  after  Equation  4.18...  s - (x-AjJ/ZM^. 

11.  Page  16,  Line  6,  after  Equation  4.18  Aj  and  M2  are  the  mean  and  standard 
deviation. 

12.  Page  20,  after  Equation  4.30  add:  P.  Hicks  showed  this  is  mathematically 
equivalent  to: 

P(e)  - 1/(1  + EXP(-1.6  e - ,07e3))  (A,30b) 


13.  Page  24,  line  after  Equation  5.19  should  be: 
Ri  * -xRi-i/U+1)  - (i  -1)  R1_2/[i(i+l)]. 

14.  Page  47,  Equation  A. 3 0.5  ■ instead  of  0.3  ». 


